Contents
- Figures
- Introduction
- Boot Sequence Flowchart
- Disk Partitioning Schemes
- PC Firmware
- Boot Loaders
- GRand Unified Boot-loader (GRUB)
- Operating System
- Initial RAM disk (initrd)
- Init Daemon
- GRUB Install
- GRUB Update
Figures
↥Introduction
I have created this guide to aid in understanding and trouble-shooting boot-time issues with Linux. In a companion article shall be providing an extensive trouble-shooting flow-chart to help identify and solve problems quickly.
This page uses pure HTML5/CSS3 and Scalable Vector Graphics (SVG) for it's illustrations. SVG allows perfect rendering of images and text at any resolution or zoom factor. Most modern browsers should correctly support SVG - please let me know if you use a browser that can't render the illustrations correctly.
There is considerable confusion amongst Intel/AMD CPU-based PC users, support agents, and even engineers over how the seemingly voodoo process of booting a PC occurs. With Basic Input Output System (BIOS) being replaced with the widespread adoption of Unified Extensible Firmware Interface (UEFI) the level of confusion and misinformation has multiplied leading to incorrect recommendations by otherwise knowledgeable people, and distrust of UEFI in others.
Prior to 2007 (U)EFI was mainly the province of Intel servers. When Microsoft announced in 2011 that Certified Windows 8 PCs would require UEFI Secure Boot to be enabled it pushed motherboard and systems manufacturers to adopt UEFI across their entire range if they wanted to use the Microsoft Windows "Windows Ready" trademarks and marketing materials.
As a result the Linux development community from the kernel, boot-loader, tools and installer developers to the distribution packagers were forced to react so that Linux would not be prevented from installing on a new PC. There was a concerted effort which has, and continues, to cope with poorly written and customised UEFI firmware images by the motherboard manufacturers in particular.
These motherboard bugs cause many problems for users wishing to install Linux on their PCs. Those users frequently bring their problems to the support forums and Internet Relay Chat (IRC) channels for help. This guide has been developed in response to my experiences of repeatedly explaining and fixing the boot process for users in the #ubuntu support channels.
↥Boot Sequence Flowchart
Click the PC POWER ON round button at the top of the flowchart to toggle between column and full-screen view
↥Device Partitioning Schemes
The Disk Label, also called the partitioning scheme or Partition Table (PT), is a small block of data that describes how the sectors (each containing a fixed number of bytes) on the device are grouped into logically separated partitions each of which can contain some operating-system defined data structure, such as: a file-system, a swap (memory paging) file, a Logical Volume Management (LVM) Physical Volume (PV), an encrypted volume such as dm-crypt/Linux Unified Key Setup (LUKS) or TrueCrypt, a Redundant Array of Independent (formerly Inexpensive) Disks (RAID) member, or any other type of data.
The partition table will store information about each partition, such as the starting sector offset (sector 0 is the first), the number of sectors it uses, a type indicator (indicates but doesn't guarantee the type of data in the partition), and flags that indicate condition, such as the active bootable partition.
Historically PC disk drives used disks with physical sectors that contain 512 bytes of data. As drive capacities have grown into the terabyte range (1,000,000,000,000 bytes) the physical sectors have increased in size and for large drives are typically 2,048 bytes or 4,096 bytes.
To maintain backwards compatibility these large-sector drives usually have a compatibility mode that will read and write 512 byte logical sectors - internally this is achieved by reading/writing the entire, larger, physical sector that contains the required 512 byte logical sector.
Partitioning schemes therefore need a way to indicate what size sectors their offsets and sizes are measured in - sector 4 of a 512 bytes per sector drive is 2,048 bytes into the drive, whereas on a 2,048 bytes-per-sector drive it is 8,192 bytes into the drive. When operating systems are aware of larger sectors the logical sectors can be the same size as the physical, which improves performance and reduces complexity.
Spinning platter disks (known as Hard Disk Drives - HDD) are now being superseded by Solid State Devices (SSD) using 'Flash' memory technology. Despite these devices having an entirely different physical construction and memory access method, they are often referred to as 'disks' and most of the rotating-disk-platter conventions are used in describing their use.
↥Master Boot Record (MBR)
Sector 0 of the device effectively contains three structures:
- Boot-strap
- Primary Partition Table
- Signature
The first 440 bytes of the MBR contain executable code and some hard-coded configuration. This boot-strap code is installed by the system Boot Loader (e.g. GRUB) and is just sufficient to load much more extensive executable code from another location on the disk.
Four partition entries; each is a 16-byte structure (64 bytes total). The table starts 66 bytes from the end of the 512-byte sector.
Each structure contains the offset (from the start of the disk) to the first sector of the partition in logical sectors, the number of logical sectors used, the partition type code (indicating what the contents of the partition are expected to be), and a boot flag.
The final 2 bytes of the sector store a signature, 0x55AA (hexadecimal, base 16), which when found indicates to the BIOS that a valid partition table can be found in the sector.
As drive capacities have increased 4 primary partitions were found to be insufficient. Additional logical partitions can be created inside Extended Partitions. One of the Primary Partition Table's entries describes the location of the first Extended Partition Table, which can describe up to 4 additional logical partitions or contain 1 partition entry and an entry linking to another Extended Partition Table. This linking can continue quite deep; I think Linux can support up to 127 Ext partitions.
↥Globally unique identifier Partition Table (GPT)
To overcome the limitations and complexity of MBR and Extended Partitions, the GUID Partition Table specification was developed. This accommodates 128 primary partitions by default and that number can be increased. A back-up copy is placed at the end of the device to help recovery from a corrupted primary GPT.
It uses Globally Unique Identifiers (GUID) to identify partition types. GUIDs have the form C12A7328-F81F-11D2-BA4B-00A0C93EC93B
which are easy for the computer to handle but difficult for people, so the primary Linux tools in the gdisk package use hexadecimal short-codes representing a 2-byte number, e.g. an EFI System Partition (ESP) is 0xEF00. Tools such as gdisk, sgdisk, cgdisk
can read and write GPT.
GUI partition editing tools such as GPartEd use the same underlying code as gdisk.
For backwards compatibility with MBR the GPT isn't stored in sector 0 of a device, avoiding the possibility that tools that don't 'know' about GPT blindly over-write GPT data. The Primary GPT Header is in sector 1, followed by 32 sectors each describing 4 partitions (128 bytes per partition entry). The back-up GPT is stored at the end of the device; the Secondary GPT Header in the last logical sector, and the remaining logical sectors in reverse order respective to the Primary GPT Header.
GPT tools can install a Protective MBR which is an MBR with a single partition entry (type 0xEE) that covers the entire disk. This prevents MBR tools that are not aware of GPT from allocating space on the disk without the user first specifically deleting the defined partition, which should give that user pause for thought as to why there is a partition there and what is in it.
GPT tools can also install a Hybrid MBR which is an extension of the Protective MBR. As well as the protective partition entry it can use the remaining three primary partition entries to point to three GPT partitions, so that the start, size, and flags match. That allows a GPT-based disk to be bootable on firmware that doesn't support GPT.
↥PC Firmware
↥Power On
At power-on the CPU begins executing code to enable Random Access Memory (RAM), reads a hard-coded description of the system configuration, configures core devices (Motherboard Controllers, Graphical Processing Unit (GPU), Input devices (Keyboard, Mouse), and Disk Controllers) and then probes for attached devices.
There are two systems for doing this:
- BIOS - introduced in the first IBM Personal Computers in 1981
- EFI - originated at Intel as Intel Boot Initiative in 1998, renamed to Extensible Firmware Interface
- UEFI - development taken over by the open development Unified EFI Forum in 2005 and UEFI v2.1 was released in 2007
Note: there are other systems such as CoreBoot and Open Firmware but they are very rarely used on industry standard PCs. You're likely to find these on ARM or MIPS based systems, especially single board computers (SBCs).
When the PC's power is switched on the first thing that happens that involves executable code is the CPU resetting to a known state which includes loading its Code Segment (CS) and Instruction Pointer (IP) registers with the fixed memory address 0xFFFFF000:FFF0
, which is 16 bytes below the top of addressable memory, and known as the Reset Vector. The CPU reads the instructions found there and executes them. These instructions are part of the PC's Firmware usually stored in ROM - although usually it is really some form of NVRAM.
The firmware is installed by the motherboard manufacturer. Its purpose is to put the essential hardware required to load an operating system and interact with the user into a known, usable, state. This includes initialising the Random Access Memory (RAM), system controllers, video display system, input devices, mass storage devices, USB, network interfaces and more. It also scans for removable devices such as DVD-ROM, USB mass storage devices, IEEE1394 (Firewire) hosts, and Preboot eXecution Environment (PXE) network connections.
Most firmwares offer an ordered list of preferred boot devices and other minimal configuration values which originally were saved in the real-time clock's (RTC) battery-backed non-volatile Complimentary Metal Oxide Silicon (CMOS) RAM of around 127 bytes Replacement of the button cell batteries used to a frequent task for PC engineers when configuration and date/time settings were lost. CMOS was rapidly replaced by other Non Volatile Random Access Memory (NVRAM).
Originally the firmware was stored in ROM (Read Only Memory) which was one-time programmable (at the factory) but as it became clear the boot code would need extending to support more hardware and functionality, and to fix bugs, it was moved to EEPROM and later NAND Flash devices. This allows tools running on the PC to write replacement firmware into the device without expensive engineer visits or return to factory.
Nowadays almost all firmware non-volatile memory is either Electrically Erasable Programmable Read Only Memory (EEPROM) or it's successor NAND (Negative-AND gate) Flash devices.
↥Basic Input Output System (BIOS)
This is the firmware used in the very first Industry Standard PCs designed and introduced by IBM in 1981 and extended continuously since. It has to maintain a massive amount of backward-compatible code which is almost never required nor used.
Whilst executing the BIOS keeps the CPU in Real Mode using segmented addressing and can directly access all physical memory without restriction.
There is usually a manual Boot over-ride key that can be pressed during power-on self-test which will present the list of boot devices in menu form for selection of a device on a one-off basis.
↥Detecting Bootable Device
BIOS does not hand over control to a boot device until it has verified that there should be valid boot-strap code on the device. First it confirms the MBR signature is 0x55AA. Some BIOS will also require one of the Primary Partition entries to have its active boot flag set or will report No Operating System
- which can be very confusing to debug if you're not aware of the idiosyncrasy.
If these checks are successful BIOS will read sector 0 into memory at 0x7C00 and then do a jmp 0x7C00
to hand over execution to the boot-strap code. If there is no valid boot-strap code the system will freeze with a flashing cursor at top left of the screen. BIOS will usually not be used again until the system next restarts - although the OS can call on BIOS services if it switches the CPU back into Real Mode.
On BIOS systems many users will be familiar with the need to install Microsoft Windows before a Linux installation due to Windows assuming it is the only operating system installed. Re-installing Windows causes the same problem, namely, that GRUB's MBR boot-strap code is replaced with the Windows boot-strap code, which results in GRUB not starting and the Linux installation being un-bootable until repaired (with the re-installation of GRUB's boot-strap into the MBR).
As we'll see UEFI addressed this issue and solved it elegantly.
↥Unified Extensible Firmware Interface (UEFI)
UEFI helps to avoid entire classes of boot-loader and boot-manager bugs, especially those involving multiple operating systems installed to the same system.
UEFI moves responsibility for boot management into the firmware. It provides a well-defined interface that operating systems can use to install themselves without interfering with other operating system installations. UEFI stores the OS boot menu options in its NV-RAM. These entries have a user-friendly label (e.g. "Ubuntu") and the file-system path to the OS's boot-loader.
In order for UEFI to be able to read files from file-systems the specification requires an EFI System Partition (ESP) GPT type 0xEF00, MBR type 0xEF, formatted with a File Allocation Table (FAT) file-system. This partition is usually no more than 512MB and typically doesn't need to be more than 128MB. The file-system variation can depend on the size but will be one of FAT12, FAT16 or FAT32. The numbers denote how many binary digits (bits) are used in each entry in the file allocation table, and thus how large the maximum sector offset into the file-system can be.
Although the UEFI specification allows MBR or GPT partitioning, UEFI-GPT has become the preferred arrangement and in many UEFI implementations UEFI boot mode will only work with GPT.
The UEFI specification defines the rules that operating systems must follow when installing their boot-loader and boot-manager into the ESP. Within the ESP the path convention is /EFI/$OS_NAME/$BOOT_LOADER
. On Linux systems this ESP file-system is typically mounted to /boot/efi/
which results in files being accessible via /boot/efi/EFI/$OS_NAME/
.
UEFI firmware switches the CPU into Protected Mode, which can address all the physical memory and provide process isolation. This is the same mode the OS uses, which makes it possible for the OS to 'call' into the firmware for services whilst it is running. For example, this is used for adding OS entries to the boot menu.
↥Boot from Removable Media
Bootable removable media such as optical disks and USB mass storage devices will not have entries in the UEFI boot menu. To ensure these devices are bootable the UEFI specification defines the fixed location /EFI/BOOT/BOOT${MACHINE_TYPE}.efi
, which for 64-bit AMD/Intel systems is /EFI/BOOT/BOOTX64.EFI
, where the removable media must place a boot-loader in order to be bootable in UEFI mode.
Removable media can be created that will boot on both BIOS and UEFI systems, and from optical, USB mass storage, HDD/SSD. In this case it can be confusing for the user to select the correct boot-menu entry. Some firmwares will list the removable device twice with cryptic codes indicating UEFI or BIOS entries. This is an area the UEFI specification did not address and as a result manufacturers (who are responsible for providing the user interface for UEFI) each seem to go their own way.
↥Secure Boot
The Secure Boot option of UEFI firmware ensures that all UEFI driver modules and the operating system boot-loader is signed by an encryption key that the system trusts. If not the system will refuse to load the boot entry. Removable media often needs Secure Boot disabling even when the boot-loader is signed, otherwise the media will not have an EFI boot option in the boot menu.
↥UEFI Shell
One of the many advantages of UEFI over BIOS is the ability to install and load additional modules and services. UEFI firmware should include a powerful shell - similar to BASH - from where it is possible to fix many boot-time issues quickly. Unfortunately many manufacturers either restrict the shell to the point it is barely useful, or leave it out altogether.
I highly recommend installing the Intel Tianocore project's EFI Shell version 2 for X64 architecture into the ESP and adding a boot menu entry for it, or putting it on removable media used for recovery as /EFI/BOOT/BOOTX64.EFI
↥Compatibility Support Module (CSM)
To provide support for legacy BIOS installations UEFI usually contains a Compatibility Support Module (CSM).
CSM provides backwards compatibility for legacy BIOS boot devices. It uses the same boot device logic as described in the BIOS Detecting Bootable Device section. It is usually an option which can be enabled or disabled within the UEFI firmware configuration by the user. Without it enabled some removable media and legacy bootable drives will not boot.
↥Boot Loaders
Boot-loaders are an intermediate step between the system firmware and the operating system. On UEFI systems the Linux kernel is capable of being directly booted by the firmware without a boot-loader but generally a boot-loader is still used to provide fall-back, alternative kernel versions, recovery, memory testing, and other services.
When the boot-loader's boot-strap code begins executing it is still in the same CPU mode as the PC firmware. Some boot-loaders will switch the CPU into Protected Mode before handing over control the operating system. In the case of Linux, the kernel does its own Protected Mode switch so most boot-loaders will hand over to Linux without switching.
↥GRand Unified Boot-loader (GRUB)
Version 2 of GRUB - the current version - is significantly different to the earlier version 1. I shall only discuss version 2 here since except for systems built before 2010, most should be using version 2.
GRUB is a very versatile boot loader which supports many architectures - not just industry standard PCs based on Intel, AMD, and compatible CPUs. Here, I'm only discussing industry standard PCs.
There are two stages to obtain a working GRUB:
- Installation
- Configuration
Installs boot-strap code that the system firmware loads and passes control to, and GRUB's core image that contains the device drivers GRUB needs to access the boot devices and read the configuration file (that describes the boot options and the menu optionally presented to users).
Executes operating system scripts that generate the configuration file to control GRUB's operations.
↥Installation (grub-install)
grub-install
is the tool. There are three modes of operation:
- BIOS MBR
- BIOS GPT
- EFI GPT
Writes the boot-strap code to sector 0 of the device and core.img
to spare sectors following it.
Writes the boot-strap code to sector 0 of the device and core.img
to a BIOS-boot partition (type 0xEF02) as raw data.
Writes the EFI boot-loader to /boot/efi/EFI/$OS_NAME/grubx64.efi
. Some Linux distributions will copy the GRUB modules and config to the UEFI ESP (e.g. Fedora) whilst others copy them to /boot/grub/
(e.g. Debian, Ubuntu, OpenSUSE). It also calls on efibootmgr
to add a boot menu entry into the UEFI configuration of the system motherboard. This results in the system offering the names of installed operating systems in the boot menu after power-on.
There are wide-spread reports of problems with system boot-menu functionality caused by bugs in the customised UEFI the system manufacturers install. GRUB and Linux tools attempt to work around the well-known issues and can be used to test, diagnose, and sometimes repair systems with this class of bug.
In all modes grub-install
also copies GRUB's dynamically loadable modules from /usr/lib/grub/$TARGET/
to /boot/grub/$TARGET/
. $TARGET is architecture-specific; for BIOS mode it is i386-pc
and for EFI x86_64-efi
.
↥Configuration (update-grub)
Operating system scripts, usually installed in /etc/grub.d/
, generate the configuration file. User customisation's are stored in the form of shell environment variables in /etc/default/grub
. update-grub
calls grub-mkconfig
which combines the variables and the scripts and writes the configuration to written to /boot/grub/grub.cfg
.
↥Boot Process
The GRUB boot process differs depending on the firmware type and firmware options.
↥BIOS
The boot-strap code is loaded from sector 0 of the device and executed in memory at 0x7C00 by the system's BIOS. Its only job is to load GRUB's core image into memory and pass execution control to core.
On MBR systems core.img
will usually be loaded from the spare sectors between the MBR and the first partition, beginning at sector 1 (sector 0 contains the MBR).
↥UEFI
The system firmware reads the boot-loader entry saved into its NV-RAM, loads the boot-loader file and then hands over execution to it.
The particular file-name that is written into the boot-menu entry depends on whether UEFI SecureBoot is enabled on the system. For a regular non-secure entry \EFI\$OS_NAME\grubx64.efi
is used.
Secure Boot
On SecureBoot systems \EFI\$OS_NAME\shim.efi
is used. This shim can be one of two:
- Signed by the Microsoft Corporation UEFI CA key
- Signed by a user-specific key whose public certificate is installed in the UEFI firmware's key database
On Ubuntu these come from the packages shim-signed
and shim
respectively.
The shim contains the distributor's public signing certificate, which loads and verifies the signature of a signed version of GRUB's core image grubx64.efi
. On Ubuntu this signed GRUB core image comes from the grub-efi-amd64-signed
package.
Later, the signed core image will check the signature of the Linux kernel it is asked to start. That kernel will in turn check the signature on any dynamically loadable modules it is asked to use.
↥Core Image
Core's responsibility is to gain access to the file-system containing the /grub/
directory. It contains just the essential GRUB modules required to access devices and load the remaining GRUB modules from GRUB root (the file system containing the /grub/
directory). On regular systems it is re-created by grub-mkimage
each time grub-install
is called. On SecureBoot systems the pre-built and signed core images are installed (see the previous section). Core image is written to /boot/efi/EFI/$OS_NAME/grubx64.efi
by grub-install
.
If GRUB's root
file-system is on a software RAID or LVM device the modules to read those devices will be included. If GRUB is configured to use a LUKS/dm_crypt encrypted root
it will include the cryptodisk
and supporting cryptographic algorithm modules.
If core fails to find and mount its root
it will drop to a rescue shell where a limited set of commands (provided by the built-in modules) can be used. If, for example, cryptodisk prompted for the LUKS pass-phrase but the user typed it incorrectly, the shell prompt allows the user to re-try with the command cryptomount $DEVICE,$PARTITION
(e.g. cryptomount hd0,gpt3
). When using the rescue shell the user must manually load and execute the normal
module (see the next section).
↥Normal (the Boot Menu)
When core has gained access to GRUB's root
file-system it executes insmod normal
and normal
to load and execute the module that reads grub.cfg
, a GRUB command-shell script that customises the boot-loader and describes the entries in the boot menu.
The user can now execute GRUB's command shell with access to all GRUB modules and functionality provided in root
.
Usually, normal
will render a boot menu - which is sometimes hidden - and wait a few seconds before booting the default menu entry. If the previous boot failed the timeout will not operate which gives the user an opportunity to manually select and modify a boot entry to achieve a successful operating system start.
For a Linux OS GRUB needs to know the name of the kernel image, any kernel command line parameters the OS requires, and its associated initial RAM-disk image. Even without a good menu entry Linux can be started manually from the GRUB command line with e.g:
linux vmlinuz-3.16.0-031600-generic root=/dev/sda1 ro initrd initrd.img-3.16.0-031600-generic boot
When starting Linux GRUB does not switch the CPU into Protected Mode because Linux does that itself, but for non-Linux operating systems GRUB will do so before handing over control.
↥Operating System
After CPU control is handed over to the kernel by the boot-loader the kernel must first decompress itself, since usually the kernel image is stored in compressed form on disk. Once uncompressed the kernel being its early initialisation which being with switching the CPU into protected mode, setting up virtual memory addressing, and beginning to discover and configure the core hardware sub-systems such as ACPI and PCI.
Next, the kernel creates a tmpfs
file-system and unpacks a very small root file-system stored inside the vmlinuz
kernel image file. It then checks whether the boot-loader loaded an Initial RAM disk image (initrd.img
) into memory and if so, it decompresses and extracts that into the tmpfs
.
At this point the kernel hands over to user-space by executing the /init
shell script.
↥Initial RAM disk (initrd)
An initrd.img
is built by the update-initramfs
tools for each installed kernel. As well as the static contents shipped by the distribution in /usr/share/initramfs-tools/
other packages, or the user, can have additional content copied in from /etc/initramfs-tools/
, notably any additional configuration and scripts from the respective conf.d/
and scripts/
sub-directories.
Additional scripts from hook/
are also called which are able to install additional resources. In particular /usr/share/initramfs-tools/hooks/busybox
which installs the shell interpreter which is symbolically linked from /bin/sh
and will be used to execute the /init
shell script.
Here, for example, is a custom encrypted disk key-file script I use to install a key-file inside the initrd /etc/initramfs-tools/hooks/01_luks_keyfile.sh
:
#!/bin/sh KEYFILE=${KEYFILE:-/path/to/.keyfile} mkdir -p $DESTDIR/${KEYFILE%/*} && cp $KEYFILE $DESTDIR/$KEYFILE
This simply takes a file (whose name is stored in the variable KEYFILE
), ensures the target directory $DESTDIR in the image exists, and copies the file into the image. See man 8 initramfs-tools HOOK SCRIPTS and Exported Variables sections for more details.
When the kernel hands over to the /init
script it is executed by the busybox
shell interpreter due to the shebang at line 1 of the script (#!/bin/sh
). The script can be found on regular running system at /usr/share/initramfs-tools/init along with the standard configuration, hooks, and scripts directories where other packages can install their own scripts.
↥Init Daemon
TODO - sysvinit vs systemd (sigh!)