Skip to main content

Understanding File System Superblock in Linux

Extended Filesystem being the default file system in Linux, we will be focusing ext file system in this article to understand super blocks.

Before we get to understanding Super blocks in a file system, let’s understand some common terminologies and building blocks of a file system.

 

Blocks in File System



When a partition or disk is formatted, the sectors in the hardisk is first divided into small groups. This groups of sectors is called as blocks. The block size is something that can be specified when a user formats a partition using the command line parameters available.

1
mkfs -t ext3 -b 4096 /dev/sda1


In the above command we have specified block size while formatting /dev/sda1 partition. The size specified is in bytes. So basically one block will be of 4096 bytes.

Block Size for Ext2 can be 1Kb, 2Kb, 4Kb, 8Kb
Block Size for Ext3 can be 1Kb, 2Kb, 4Kb, 8Kb
Block Size for Ext4 can be 1Kb to 64Kb


The block size you select will impact the following things

  • Maximum File Size
  • Maximum File System Size
  • Performance


The reason block size has an impact on performance is because, the file system driver sends block size ranges to the underlying drive, while reading and writing things to file system. Just imagine if you have a large file, reading smaller blocks (which combined together makes the file size) one by one will take longer. So the basic idea is to keep bigger block size, if your intention is to store large files on the file system.

Less IOPS will be performed if you have larger block size for your file system.

And if you are willing to store smaller files on the file system, its better to go with smaller block size as it will save a lot of disk space and also from performance perspective.
Hard disk sector is nothing but a basic storage unit of the drive, which can be addressed. Most physical drives have a sector size of 512 bytes. Please keep the fact in mind that hard disk sector is the property of the drive. Block size of file systems (that we just discussed above) is a software construct, and is not to be confused with the hard disk sector.
A linux Kernal performs all its operations on a file system using block size of the file system. The main important thing to understand here is that the block size can never be smaller than the hard disk's sector size, and will always be in multiple of the hard disk sector size. The linux Kernel also requires the file system block size to be smaller or equal to the system page size. 
Linux system page size can be checked by using the below command.

1
2
root@localhost:~# getconf PAGE_SIZE
4096


Block Groups in File System


The blocks that we discussed in the previous section are further grouped together to form block groups for ease of access during read and writes. This is primarily done to reduce the amount of time taken while reading or writing large amounts of data.
The ext file system divides the entire space of the partition to equal sized block groups(these block groups are arranged one after the other in a sequential manner). 

A typical partition layout looks something like the below at a very high level.
Linux Partition Layout with Block Groups

Number of blocks per group is fixed, and cannot be changed. Generally the number of blocks per block groups is 8*block size.

Lets see an output of mke2fs command, that displays few of the information that we discussed till now.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
root@localhost:~# mke2fs /dev/xvdf
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
6553600 inodes, 26214400 blocks
1310720 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
800 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
    4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done


If you see the above output, it gives you the below details:
  • Block Size (4096 bytes)
  • 800 Block Groups
  • 32768 Blocks per group (which is 8*4096, as mentioned earlier)

It also shows the superblock backup locations on the partition. That is the block groups where superblock backups are stored.

What is File System Superblock?


The most simplest definition of Superblock is that, its the metadata of the file system. Similar to how i-nodes stores metadata of files, Superblocks store metadata of the file system. As it stores critical information about the file system, preventing corruption of superblocks is of utmost importance.

If the superblock of a file system is corrupted, then you will face issues while mounting that file system. The system verifies and modifies superblock each time you mount the file system.

Superblocks also stores configuration of the file system. Some higher level details that is stored in superblock is mentioned below.

  • Blocks in the file system
  • No of free blocks in the file system
  • Inodes per block group
  • Blocks per block group
  • No of times the file system was mounted since last fsck.
  • Mount time
  • UUID of the file system
  • Write time
  • File System State (ie: was it cleanly unounted, errors detected etc)
  • The file system type etc(ie: whether its ext2,3 or 4).
  • The operating system in which the file system was formatted

The primary copy of superblock is stored in the very first block group. This is called primary superblock, because this is the superblock that is read by the system when you mount the file system. As block groups are counted from 0, we can say that the primary superblock is stored at the beginning of block group 0.
As superblock is a very critical component of the file system, a backup redundant copy is placed at each "block group".
In other words, every "block group" in the file system will have the backup superblock. This is basically done to recover the superblock if the primary one gets corrupted.

You can easily imagine that storing backup copies of superblock in every "block group", can consume a considerable amount of file system storage space. Due to this very reason, later versions implemented a feature called "sparse_super" which basically stores backup superblocks only on block groups 0, 1 and powers of 3,5,7. This option is by default enabled in latest system's, due to which you will see backup copies of superblock only on several block groups(which is evident from the mke2fs output shown in the previous section).

How to view Superblock Information of a File System?


You can view superblock information of an existing file system using dumpe2fs command as shown below.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
root@localhost:~# dumpe2fs -h /dev/xvda1
dumpe2fs 1.42.9 (4-Feb-2014)
Filesystem volume name:   cloudimg-rootfs
Last mounted on:          /
Filesystem UUID:          f75f9307-27dc-4af8-87b7-f414c0fe280f
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              6553600
Block count:              26212055
Reserved block count:     1069295
Free blocks:              20083290
Free inodes:              6470905
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      505
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Sat Sep 27 13:05:57 2014
Last mount time:          Mon Feb  2 14:43:31 2015
Last write time:          Sat Sep 27 13:06:55 2014
Mount count:              4
Maximum mount count:      20
Last checked:             Sat Sep 27 13:05:57 2014
Check interval:           15552000 (6 months)
Next check after:         Thu Mar 26 13:05:57 2015
Lifetime writes:          305 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:              256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       396056
Default directory hash:   half_md4
Directory Hash Seed:      2124542b-ea2f-4552-afaa-c5720283d2cd
Journal backup:           inode blocks
Journal features:         journal_incompat_revoke
Journal size:             128M
Journal length:           32768
Journal sequence:         0x0151d29d
Journal start:            11415<span style="font-size:16px;">
</span>

You can also view the exact locations of superblock and backups using the same dumpe2fs command as shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@localhost:~# dumpe2fs /dev/xvda1 | grep -i superblock
dumpe2fs 1.42.9 (4-Feb-2014)
  Primary superblock at 0, Group descriptors at 1-7
  Backup superblock at 32768, Group descriptors at 32769-32775
  Backup superblock at 98304, Group descriptors at 98305-98311
  Backup superblock at 163840, Group descriptors at 163841-163847
  Backup superblock at 229376, Group descriptors at 229377-229383
  Backup superblock at 294912, Group descriptors at 294913-294919
  Backup superblock at 819200, Group descriptors at 819201-819207
  Backup superblock at 884736, Group descriptors at 884737-884743
  Backup superblock at 1605632, Group descriptors at 1605633-1605639
  Backup superblock at 2654208, Group descriptors at 2654209-2654215
  Backup superblock at 4096000, Group descriptors at 4096001-4096007
  Backup superblock at 7962624, Group descriptors at 7962625-7962631
  Backup superblock at 11239424, Group descriptors at 11239425-11239431
  Backup superblock at 20480000, Group descriptors at 20480001-20480007
  Backup superblock at 23887872, Group descriptors at 23887873-23887879


How can I use backup superblocks to recover a corrupted file system?


The first thing to do is to do a file system check using fsck utility. This is as simple as running fsck command against your required file system as shown below.


1
root@localhost:~# fsck.ext3 -v /dev/xvda1


If fsck output shows superblock read errors, you can do the below to fix this problem.

First step is to Identify where the backup superblocks are located. This can be done by the earlier shown method of using dumpe2fs command OR using the below command also you can find the backup superblock locations.

1
root@localhost:~# mke2fs -n /dev/xvda1


-n option used with mke2fs in the above example, will show the backup superblock locations, without creating an file system. Read mke2fs man page for more information on this command line switch.
Second step is to simply restore the backup copy of superblock using e2fsck command as shown below.

1
root@localhost:~# e2fsck -b 32768 /dev/xvda1


In the above shown example the number 32768 i have used is the location of the first backup copy of the superblock. Once the above command succeeds, you can retry mounting the file system.

Alternatively you can also use sb option available in mount command. sb option lets you specify the superblock to use while mounting the file system. As mentioned earlier in the article, when you mount a file system, by default the primary superblock is read. Instead you can force mount command to read a backup superblock in case the primary one is corrupted. Below shown is an example mount command using a backup superblock to mount a file system.

1
root@localhost:~# mount -o -sb=98304 /dev/xvda1 /data


The above shown mount command will use backup superblock located at block 98304 while mounting.

Comments

Popular posts from this blog

Boot process hangs at dracut: Switching root

Environment Red Hat Enterprise Linux 6 Issue When server is booting the boot process hangs at  dracut: Switching root , and never displays anything else. Raw device-mapper: ioctl: 4.33.1-ioctl (2015-8-18) initialised: xx-xxxx@redhat.com udev: starting version 147 dracut: Starting plymouth daemon dracut: rd_NO_DM: removing DM RAID activation dracut: rd_NO_MD: removing MD RAID activation scsi0 : ata_piix scsi1 : ata_piix ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc120 irq 14 ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc128 irq 15 Refined TSC clocksource calibration: 2599.999 MHz. virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 virtio-pci 0000:00:07.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 virtio-pci 0000:00:08.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, high) -> IRQ 11 input: ImExPS/2 G...

Interpreting the output of lspci

On Linux, the lspci command lists all PCI devices connected to a host (a computer). Modern computers and PCI devices communicate with each other via PCI Express buses instead of the older Conventional PCI and PCI-X buses since the former buses offer many advantages such as higher throughput rates, smaller physical footprint and native hot plugging functionality. The high performance of the PCI Express bus has also led it to take over the role of other buses such as AGP ; it is also expected that SATA buses too will be replaced by PCI Express buses in the future as solid-state drives become faster and therefore demand higher throughputs from the bus they are attached to (see this article for more on this topic). As a first step, open a terminal and run lspci without any flags (note: lspci may show more information if executed with root privileges): lspci   This is the output I get on my laptop: 00:00.0 Host bridge: Intel Corporation Haswell-ULT DRA...

How to get the SAN environment information and statistics on AIX, HP-UX, Linux, Solaris, and Windows

How to get the SAN environment information and statistics on AIX, HP-UX, Linux, Solaris, and Windows Description NetBackup SAN Client is supported on the Linux , Solaris, Windows, HP-UX and AIX operating systems.  These environments provide the initiator device driver which can login to the SAN client media server and mount an pseudo   target device “ARCHIVE PYTHON” so that the backup or restore can be use the fiber transport (FT).  If there is an issue in the SAN environment, it is necessary to get the information/statistics from the SAN fabric for analysis.  The commands below can be used, on the respective operating system, to gather the necessary information. If the outputs show many or steadily increasing error counts, that indicates one or more issues with  the fabric  infrastructure. The issue(s) can be caused by cabling, SFP, san switch, DWDM, HBA or ISL and those components will need to be analyzed and...