© 2014 TJ <hacker@iam.tj> Work in progress, more to come; last updated 2014-08-22 08:30 UTC
I have created this guide to aid in understanding and trouble-shooting boot-time issues with Linux. In a companion article I shall be providing an extensive trouble-shooting flow-chart to help identify and solve problems quickly.
This page uses pure HTML5 and Scalable Vector Graphics (SVG) for all illustrations. SVG allows perfect rendering of images and text at any resolution or zoom factor. Most modern browsers should correctly support SVG - please let me know if you use a browser that can't render the illustrations correctly.
There is considerable confusion amongst Intel/AMD CPU-based PC users, support agents, and even engineers over how the seemingly voodoo process of booting a PC occurs. With Basic Input Output System (BIOS) being replaced with the widespread adoption of Unified Extensible Firmware Interface (UEFI) the level of confusion and misinformation has multipled leading to incorrect recommendations by otherwise knowledgeable people, and distrust of UEFI in others.
Prior to 2007 (U)EFI was mainly the province of Intel servers. When Microsoft announced in 2011 that Certified Windows 8 PCs would require UEFI Secure Boot to be enabled it pushed motherboard and systems manufacturers to adopt UEFI across their entire range if they wanted to use the Microsoft Windows "Windows Ready" trademarks and marketing materials.
As a result the Linux development community from the kernel, boot-loader, tools and installer developers to the distribution packagers were forced to react so that Linux would not be prevented from installing on a new PC. There was a concerted effort which has, and continues, to cope with poorly written and customised UEFI firmware images by the motherboard manufacturers in particular.
These motherboard bugs cause many problems for users wishing to install Linux on their PCs. Those users frequently bring their problems to the support forums and Internet Relay Chat (IRC) channels for help. This guide has been developed in response to my experiences of repeatedly explaining and fixing the boot process for users in the #ubuntu support channels.
The Disk Label, also called the partitioning scheme, is a small block of data that describes how the sectors (each containing a block of data) on the device are divided into logically separated partitions each of which can contain some operating-system defined data structure, such as: a file-system, a swap (memory paging) file, a Logical Volume Management (LVM) Physical Volume (PV), an encrypted volume such as Linux Unified Key Setup (LUKS) or TrueCrypt, a Redundant Array of Independent (formerly Inexpensive) Disks (RAID) member, or any other type of data.
The partition table will store information about each partition, such as the starting sector offset (sector 0 is the first), the number of sectors it uses, a type indicator (indicates but doesn't guarantee the type of data in the partition), and flags that indicate condition, such as the active bootable paritition.
Historically PC disk drives used disks with physical sectors that contain 512 bytes of data. As drive capacities have grown into the terabyte range (1,000,000,000,000 bytes) the physical sectors have increased in size and for large drives are typically 2,048 bytes.
To maintain backwards compatibilty these large-sector drives usually have a compatibility mode that will read and write 512 byte logical sectors - internally this is achieved by reading/writing the entire 2,048 byte physical sector that contains the required 512 byte logical sector.
Partitioning schemes therefore need a way to indicate what size sectors their offsets and sizes are measured in - sector 4 of a 512 bytes per sector drive is 2,048 bytes into the drive, whereas on a 2,048 bytes-per-sector drive sector 4 is 8,192 bytes into the drive. When operating systems are aware of larger sectors the logical sectors can be the same size as the physical, which improves peformance and reduces complexity.
Spinning platter disks (known as Hard Disk Drives - HDD) are now being superceded by Solid State Devices (SSD) using 'flash' memory technology. Despite these devices having an entirely different physical construction and memory access method, they are often referred to as 'disks' and most of the rotating-disk-platter conventions are used in describing their use.
Sector 0 of the device effectively contains three structures:
Four partition entries, each is a 16-byte structure (64 bytes total). The table starts 66 bytes from the end of a 512-byte sector.
Each structure contains the offset (from the start of the disk) to the first sector of the partition in logical sectors, the number of logical sectors used, the partition type code (indicating what the contents of the partition are expected to be), and a boot flag.
The final 2 bytes of the sector store a signature, 0x55AA (hexadecimal, base 16), which when found indicates to the BIOS that a valid partition table can be found in the sector.
The first 440 bytes of sector 0 contain executable code. This boot-strap code is installed by the system Boot Loader (e.g. GRUB) and is just sufficient to load much more extensive executable code from another location on the disk.
As drive capacities increased 4 primary partitions were found to be insufficient. Additional logical partitions can be created inside Extended Partitions. One of the Primary Partition Table's entries describes the location of the first Extended Partition Table, which can describe up to an additional 4 logical partitions or contain 1 partition entry and an entry linking to another Extended Partition Table. This linking can continue quite deep; I think Linux can support up to 127 Extended partitions.
To overcome the limitations and complexity of MBR and Extended Partitions, the GUID Partition Table specification was developed. This accomodates
128 primary partitions by default and that number can be increased. A back-up copy is placed at the end of the device to help recovery
from a corrupted primary GPT, and it uses Globally Unique Identifiers (GUID) to identify partition types. GUIDs have the form
C12A7328-F81F-11D2-BA4B-00A0C93EC93B which are easy for the computer to handle but difficult for people, so we use hexadecimal short-codes
representing a 2-byte number, e.g. an EFI System Partition (ESP) is 0xEF00. Tools such as gdisk can read and write GPT.
For backwards compatibility with MBR the GPT isn't stored in sector 0 of a device, avoiding the possibility that tools that don't 'know' about GPT blindly over-write GPT data. The Primary GPT Header is in sector 1, followed by 32 sectors each describing 4 partitions (128 bytes per partition entry). The back-up GPT is stored at the end of the device; the Secondary GPT Header in the last logical sector, and the remaining logical sectors in reverse order respective to the Primary GPT Header.
GPT tools can install a Protective MBR which is an MBR with a single partition entry (type 0xEE) that covers the entire disk. This prevents MBR tools from allocating space on the disk without the user first specifically deleting the defined partition, which should give that user pause for thought as to why there is a partition there and what is in it.GPT tools can also install a Hybrid MBR which is an extension of the Protective MBR. As well as the protective partition entry it can use the remaining three primary partition entries to point to three GPT partitions, so that the start, size, and flags match. That allows a GPT-based disk to be bootable on BIOS and EFI systems
At power-on the CPU begins executing code to enable Random Access Memory (RAM), reads a hard-coded description of the system configuration, configures core devices (Motherboard Controllers, Graphical Processing Unit (GPU), Input devices (Keyboard, Mouse), and Disk Controllers) and then probes for attached devices.
There are two systems for doing this:
When the PC's power is switched on the first thing that happens that involves executable code is the CPU resetting to a known state which includes
loading its Code Segment (CS) and Instruction Pointer (IP) registers with the fixed memory address 0xFFFFF000:FFF0,
which is 16 bytes below the top of addressable memory, and known as the Reset Vector. The CPU reads the instructions found there and executes
them. These instructions are part of the PC's Firmware usually stored in Electrically Eraseable Programmable Read Only Memory (EEPROM) Integrated
Circuits (ICs) - better known as one of many types of Silicon Chip.
The firmware is installed by the motherboard manufacturer. Its purpose is to put the essential hardware required to load an operating system and interact with the user into a known, usable, state. This includes initialising the Random Access Memory (RAM), system controllers, video display system, input devices, mass storage devices, USB, network interfaces and more. It also scans for removable devices such as DVD-ROM, USB mass storage devices, IEEE1394 (Firewire) hosts, and Preboot eXecution Environment (PXE) network connections.
This is the firmware used in the very first PCs introduce by IBM in 1981 and extended continuously since. It has to maintain a massive amount of backward-compatible code which is almost never required nor used. Most BIOS offer an ordered list of preferred boot devices saved in the real-time clock's (RTC) non-volatile Complimentary Metal Oxide Silicon (CMOS) RAM. BIOS keeps the CPU in Real Mode using segmented addressing and can directly access all physical memory without restriction.
There is usually a manual Boot over-ride key that can be pressed during power-on self-test which will present the list of boot devices in menu form for selection of a device on a one-off basis.
BIOS does not hand over control to a boot device until it has verified that there should be valid boot-strap code on the device.
First it confirms the MBR signature is 0x55AA. Some BIOS will also require one of the Primary Partition entries to have its active boot flag
set or will report No Operating System - which can be very confusing to debug if you're not aware of the idiosyncrasy.
If these checks are successful BIOS will read sector 0 into memory at 0x7C00 and then do a jmp 0x7C00 to hand over execution to
the boot-strap code. If there is no valid boot-strap code the system will freeze with a flashing cursor at top left of the screen. BIOS
will usually not be used again until the system next restarts, although the OS can call on BIOS services if it switches the CPU back into Real Mode.
UEFI helps to avoid entire classes of boot-loader and boot-manager bugs, especially those involving multiple operating systems installed to the same system. On BIOS systems many users will be familiar with the need to install Microsoft Windows before a Linux installation due to Windows assuming it is the only operating system installed. Re-installing Windows causes the same problem, namely, that GRUB's MBR boot-strap code is replaced with the Windows boot-strap code, which results in GRUB not starting and the Linux installation being un-bootable until repaired (with the reinstallation of GRUB's boot-strap into the MBR).
UEFI moves responsibility for boot management into the firmware. It provides a well-defined interface that operating systems can use to install themselves without interferring with other operating system installations. UEFI stores the OS boot menu options in its NV-RAM. These entries have a user-friendly label (e.g. "Ubuntu") and the file-system path to the OS's boot-loader.
In order for UEFI to be able to read files from file-systems the specification requires an EFI System Partition (ESP) GPT type 0xEF00, MBR type 0xEF, formatted with a File Allocation Table (FAT) file-system. This partition is usually no more than 512MB and typically doesn't need to be more than 128MB. The file-system variation can depend on the size but will be one of FAT12, FAT16 or FAT32. The numbers denote how many binary digits (bits) are used in each entry in the file allocation table, and thus how large the maximum sector offset into the file-system can be.
Although the UEFI specification allows MBR or GPT partitioning, UEFI-GPT has become the preferred arrangement and in many UEFI implementations UEFI boot mode will only work with GPT.
The UEFI specification defines the rules that operating systems must follow when installing their boot-loader and boot-manager into the ESP.
Within the ESP the path convention is /EFI/$OS_NAME/$BOOT_LOADER. On Linux systems this ESP file-system is typically mounted to
/boot/efi/ which results in files being accessible via /boot/efi/EFI/$OS_NAME/.
UEFI firmware switches the CPU into Protected Mode, which can address all the physical memory and provide process isolation. This is the same mode the OS uses, which makes it possible for the OS to 'call' into the firmware for services whilst it is running. For example, this is used for adding OS entries to the boot menu.
Bootable removable media such as optical disks and USB mass storage devices will not have entries in the UEFI boot menu. To ensure these devices are
bootable the UEFI specification defines the fixed location /EFI/BOOT/BOOT${MACHINE_TYPE}.efi, which for 64-bit AMD/Intel systems is
/EFI/BOOT/BOOTX64.EFI, where the removable media must place a boot-loader in order to be bootable in UEFI mode.
Removable media can be created that will boot on both BIOS and UEFI systems, and from optical, USB mass storage, HDD/SDD. In this case it can be confusing for the user to select the correct boot-menu entry. Some firmwares will list the removable device twice with cryptic codes indicating UEFI or BIOS entries. This is an area the UEFI specification did not address and as a result manufacturers (who are responsible for providing the user interface for UEFI) each seem to go their own way.
The Secure Boot option of UEFI firmware ensures that the boot-loader is signed by an encryption key that the system trusts. If not the system will refuse to load the boot entry. Removable media often needs Secure Boot disabling even when the boot-loader is signed, otherwise the media will not have an EFI boot option in the boot menu.
One of the many advantages of UEFI over BIOS is the ability to install and load additional modules and services. UEFI firmware should include a
powerful shell - similar to bash - from where it is possible to fix many boot-time issues quickly. Unfortunately many manufacturers either restrict the
shell to the point it is barely useful, or leave it out altogether. In these cases I highly recommend installing the Intel Tianocore
project's EFI Shell version 2 for X64 architecture into
the ESP and adding a boot menu entry for it, or putting it on removable media used for recovery as /EFI/BOOT/BOOTX64.EFI
CSM provides backwards compatibility for legacy BIOS boot devices. It uses the same boot device logic as described in the BIOS Detecting Bootable Device section. It is usually an option which can be enabled or disabled within the UEFI firmware configuration by the user. Without it enabled some removable media and legacy bootable drives will not boot.
To provide support for legacy BIOS installations UEFI usually contains a Compatibility Support Module (CSM) which provides legacy BIOS boot services for MBR and GPT devices. In most cases CSM can be enabled or disabled in the firmware settings. CSM will not be available when SecureBoot is enabled.
Boot-loaders are an intermediate step between the system firmware and the operating system. On UEFI systems the Linux kernel is cabable of being directly booted by the firmware without a boot-loader but generally a boot-loader is still used to provide fall-back, alternative kernel versions, recovery, memory testing, and other services.
When the boot-loader's boot-strap code begins executing it is still in the same CPU mode as the PC firmware. Some boot-loaders will switch the CPU
into Protected Mode before handing over control the operating system. In the case of Linux, the kernel does its own Protected Mode switch so
most boot-loaders will hand over to Linux without switching.
Version 2 of GRUB - the current version - is significantly different to the earlier version 1. I shall only discuss version 2 here since
except for systems four or more years old, most should be using version 2. GRUB is a very versatile boot loader which supports many architectures - not just industry standard PCs based on Intel, AMD, and
compatible CPUs. Here, I'm only discussing industry standard PCs. There are two stages to a GRUB installation:
Installs boot-strap code that the system firmware loads and passes control to, and GRUB's core image that contains the device drivers
GRUB needs to access the boot devices and read the configuration file (that describes the boot menu presented to users). Executes operating system scripts that generate the configuration file to control GRUB's operations.↥GRand Unified Boot-loader (GRUB)
grub-install is the tool. There are three modes of operation:
Writes the boot-strap code to sector 0 of the device and core.img to spare sectors following it.
Writes the boot-strap code to sector 0 of the device and core.img to a BIOS-boot partition (type 0xEF02) as raw data.
Writes the EFI boot-loader to /boot/efi/EFI/$OS_NAME/grubx64.efi. Some Linux distributions will copy the GRUB modules and config
to the UEFI ESP (e.g. Fedora) whilst others copy them to /boot/grub/ (e.g. Debian, Ubuntu, OpenSUSE).
It also calls on efibootmgr to add a boot menu entry into the UEFI configuration of the system motherboard. This results in the
system offering the names of installed operating systems in the boot menu after power-on.
There are wide-spread reports of problems with system boot-menu functionality caused by bugs in the customised UEFI the system manufacturers install. GRUB and Linux tools attempt to work around the well-known issues and can be used to test, diagnose, and sometimes repair systems with this class of bug.
In all modes grub-install also copies GRUB's dynamically loadable modules from /usr/lib/grub/$TARGET/ to /boot/grub/$TARGET/.
$TARGET is architecture-specific; for BIOS mode it is /boot/grub/i386-pc and for EFI /boot/grub/x86_64-efi/.
Operating system scripts, usually installed in /etc/grub.d/, generate the configuration file. User customisations are
stored in the form of shell environment variables in /etc/default/grub. update-grub calls grub-mkconfig
which combines the variables and the scripts and writes the configuration to written to /boot/grub/grub.cfg.
The GRUB boot process differs depending on the firmware type and firmware options.
The boot-strap code is loaded from sector 0 of the device and executed in memory at 0x7C00 by the system's BIOS. Its only job is to load the core image into memory and pass execution control to core.
On MBR systems core.img will usually be loaded from the spare sectors between the MBR and the first partition, beginning
at sector 1.
The system firmware reads the boot-loader entry saved into its NV-RAM, loads the boot-loader file and then hands over execution to it.
The particular file-name that is written into the boot-menu entry depends on whether UEFI SecureBoot is enabled on the system.
For a regular non-secure entry \EFI\$OS_NAME\grubx64.efi is used.
On SecureBoot systems \EFI\$OS_NAME\shim.efi is used. This shim can be one of two:
shim-signed and shim respectively.
The shim contains the distributor's public signing certificate, which loads and verifies the signature of a signed version of
GRUB's core image grubx64.efi. On Ubuntu this signed GRUB core image comes from the grub-efi-amd64-signed package.
Later, the signed core image will check the signature of the Linux kernel it is asked to start. That kernel will in turn check the signature on any dynamically loadable modules it is asked to use.
Core's responsibility is to gain access to the file-system containing the /grub/ directory. It contains just the essential
GRUB modules required to access devices and load the remaining GRUB modules from GRUB root (the file system containing the
/grub/ directory). On regular systems it is re-created by grub-mkimage each time grub-install is
called. On SecureBoot systems the pre-built and signed core images are installed (see the previous section).
Core image is written to /boot/efi/EFI/$OS_NAME/grubx64.efi by grub-install.
If GRUB's root file-system is on a software RAID or LVM device the modules to read those devices will be included.
If GRUB is configured to use
a LUKS/dm_crypt encrypted root it will include the cryptodisk and supporting cryptographic algorthim modules.
If core fails to find and mount its root it will drop to a rescue shell where a limited set of commands (provided by the
built-in modules) can be used. If, for example, cryptodisk prompted for the LUKS pass-phrase but the user typed it incorrectly, the shell
prompt allows the user to re-try with the command cryptomount $DEVICE,$PARTITION (e.g. cryptomount hd0,gpt3).
When using the rescue shell the user must manually load and execute the normal module (see the next section).
When core has gained access to GRUB's root file-system it executes insmod normal and normal to load
and execute the module that reads grub.cfg, a shell script that customises the boot-loader and describes the entries in the boot
menu.
The user can now execute GRUB's command shell with access to all GRUB modules and functionality provided in root.
Usually, normal will render a boot menu - which is sometimes hidden - and wait a few seconds before booting the default
menu entry. If the previous boot failed the timeout will not operate which gives the user an opportunity to manually select and modify
a boot entry to achieve a successful operating system start.
For a Linux OS GRUB needs to know the name of the kernel image, any kernel command line parameters the OS requires, and its associated initial RAM-disk image. Even without a good menu entry Linux can be started manually from the GRUB command line with e.g:
linux vmlinuz-3.16.0-031600-generic root=/dev/sda1 ro initrd initrd.img-3.16.0-031600-generic boot
When starting Linux GRUB does not switch the CPU into Protected Mode because Linux does that itself, but for non-Linux operating systems GRUB will do so before handing over control.
After CPU control is handed over to the kernel by the boot-loader the kernel must first uncompress itself, since usually the kernel image is stored in compressed form on disk. Once uncompressed the kernel being its early initialisation which being with switching the CPU into protected mode, setting up virtual memory addressing, and begining to discover and configure the core hardware sub-systems such as ACPI and PCI.
Next, the kernel creates a tmpfs file-system and unpacks a very small root file-system stored inside the vmlinuz kernel
image file. It then checks whether the boot-loader loaded an Initial RAM disk image (initrd.img) into memory and if so, it uncompresses
and extracts that into the tmpfs.
At this point the kernel hands over to user-space by executing the /init shell script.
An initrd.img is built by the update-initramfs tools for each installed kernel. As well as the static contents shipped by
the distribution in /usr/share/initramfs-tools/ other packages, or the user, can have additional content copied in from
/etc/initramfs-tools/, notably any additional configuration and scripts from the respective conf.d/ and
scripts/ sub-directories.
Additional scripts from hook/ are also called which are able to install additional resources. In particular
/usr/share/initramfs-tools/hooks/busybox which installs the shell interpreter which is symbolically linked from /bin/sh
and will be used to execute the /init shell script.
Here, for example, is a custom encrypted disk key-file script I use to install a key-file inside the initrd /etc/initramfs-tools/hooks/01_luks_keyfile.sh :
#!/bin/sh
KEYFILE=${KEYFILE:-/path/to/.keyfile}
mkdir -p $DESTDIR/${KEYFILE%/*} && cp $KEYFILE $DESTDIR/$KEYFILE
This simply takes a file (whose name is stored in the variable KEYFILE), ensures the target directory in the image exists, and copies the
file into the image.
When the kernel hands over to the /init script it is executed by the busybox shell interpreter due to the
shebang at line 1 of the script (#!/bin/sh).