Skip to main content

Solving NFS Mounts at Boot Time

Let’s face it. NFS is a magical thing. It allows you to centralize your storage, share volumes across systems, and all while maintaining sane permissions and ownership. Unfortunately, it can also be a bit of a fickle beast. Let’s say you just had your volume configured and you set up the mounts. You go and run this command:

mount -t nfs 10.10.10.1:/vol1/fs1 /data
 
Works like a champ, you now have your data partition mounted over NFS. So you add this line to your /etc/fstab and make it mount automagically.
10.10.10.1:/vol1/fs1 /data nfs defaults 0 0
A few weeks go by and you apply a kernel update. No big deal, you apply the updates and during your next maintenance window reboot to apply the new kernel. Then you start to see applications failing and notice the volume isn’t actually mounted. This is an unfortunate result of the automounter subsystem.
It’s like this. At boot time the root partition gets mounted, automounter reads the /etc/fstab file, and boots any filesystem that doesn’t have noauto as a mount option. We’re still very early in the boot process so the network isn’t up yet, so naturally any network filesystems fail. The real problem here is that at no point does automounter go back and attempt to remount those systems. So your NFS mount points fail because there is no network, and done is done.
The developers were nice enough to provide a fix for this. There exists a mount option called _netdev. If we quote directly from the man page (sourced from RHEL 6.4):
_netdev
The filesystem resides on a device that requires network access (used to prevent the system from attempting to mount these filesystems until the network has been enabled on the system).
This is awesome, and exactly what we want. So you modify your entry in fstab to look like this:

10.10.10.1:/vol1/fs1 /data nfs defaults,_netdev 0 0
 
You’ve been bitten by NFS mounting in the past so you throw this in your test environment and reboot immediately. After the system comes up you notice a problem. Your NFS volumes are still unmounted. You see, there’s a bit of a hitch. Automounter followed the same procedure that it did before, except this time it didn’t even attempt to mount /data. The _netdev option doesn’t tell the system to mount the filesystem when network comes up, it says don’t attempt to mount it at all if the network isn’t up. There is still a missing piece to the puzzle. If you look at your init scripts there is a service called netfs. If you read the script you can see in the chkconfig header this description:
# description: Mounts and unmounts all Network File System (NFS),
# CIFS (Lan Manager/Windows), and NCP (NetWare) mount points.
This is exactly what you need. It is a service whose sole purpose is to read your /etc/fstab and mount network filesystems. All you have to do is enable it
chkconfig netfs on
and watch the magic happen. Now your mount boot process should look something like this:
  1. Automounter reads /etc/fstab
  2. Ignores /data since it has _netdev option set
  3. Mounts all other filesystems
  4. Finishes mount jobs and allows system to continue booting
  5. Network comes up
  6. Service netfs started
  7. netfs reads /etc/fstab and finds an nfs filesystem
  8. netfs mounts /data
What’s funny is that while I was researching this problem I never stumbled across netfs as a service. I had even gone so far as to start planning out my own custom init script that would do exactly this, except specifically for my mount points instead of generalizing. It’s nice to see that I was on the right track, but even better that the tools already existed.



What happens after adding _netdev option in fstab ?

During booting of Linux when process is trying to mount the file system present in /etc/fstab, at that time NFS mount points present in that file are not able to mount because the network service was not started at that time. Hence Redhat recommends us to add _netdev option for NFS file system in /etc/fstab. So that they can be mounted after starting the networking service.

But what makes it to mount after booting behind the scenes ?
Lets figure out that.

Currently I am in runlevel 3 directory. By default majorly we are using this run level for linux booting.

[root@Node1 rc3.d]# pwd
/etc/rc.d/rc3.d

In this it will show me that order of start and kill of scripts or services.
Note : Below output is shown selectively to make this article precise.

[root@Node1 rc3.d]# ll
lrwxrwxrwx. 1 root root 17 Sep 30 01:36 S10network -> ../init.d/network
lrwxrwxrwx. 1 root root 15 Sep 30 01:36 S25netfs -> ../init.d/netfs
lrwxrwxrwx. 1 root root 14 Sep 30 02:29 S26gfs2 -> ../init.d/gfs2

By default network service is destined to start at priority 10 and netfs service at 25.
If we have added the _netdev option in /etc/fstab for NFS mountpoint, netfs will take the responsibility to mount those NFS file systems after the network service start.
If your NFS file system are not getting mounted even after adding the _netdev option in /etc/fstab. We need to check whether netfs service is supposed to start during boot or not.

[root@Node1 rc3.d]# chkconfig –list | grep netfs
netfs           0:off   1:off   2:off 3:on     4:on     5:on    6:off

If still getting problem while mounting NFS file system during boot process.
Last option is to add the “NETWORKDELAY=60” in below file.

[root@Node1 rc3.d]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=Node1
NETWORKDELAY=60

Tip : As shown above GFS2 file systems are also not supposed to start at booting time. gfs2 mounts even start after mounting the NFS file system.

Comments

Popular posts from this blog

Boot process hangs at dracut: Switching root

Environment Red Hat Enterprise Linux 6 Issue When server is booting the boot process hangs at  dracut: Switching root , and never displays anything else. Raw device-mapper: ioctl: 4.33.1-ioctl (2015-8-18) initialised: xx-xxxx@redhat.com udev: starting version 147 dracut: Starting plymouth daemon dracut: rd_NO_DM: removing DM RAID activation dracut: rd_NO_MD: removing MD RAID activation scsi0 : ata_piix scsi1 : ata_piix ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc120 irq 14 ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc128 irq 15 Refined TSC clocksource calibration: 2599.999 MHz. virtio-pci 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 virtio-pci 0000:00:07.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11 virtio-pci 0000:00:08.0: PCI INT A -> Link[LNKD] -> GSI 11 (level, high) -> IRQ 11 input: ImExPS/2 G...

Interpreting the output of lspci

On Linux, the lspci command lists all PCI devices connected to a host (a computer). Modern computers and PCI devices communicate with each other via PCI Express buses instead of the older Conventional PCI and PCI-X buses since the former buses offer many advantages such as higher throughput rates, smaller physical footprint and native hot plugging functionality. The high performance of the PCI Express bus has also led it to take over the role of other buses such as AGP ; it is also expected that SATA buses too will be replaced by PCI Express buses in the future as solid-state drives become faster and therefore demand higher throughputs from the bus they are attached to (see this article for more on this topic). As a first step, open a terminal and run lspci without any flags (note: lspci may show more information if executed with root privileges): lspci   This is the output I get on my laptop: 00:00.0 Host bridge: Intel Corporation Haswell-ULT DRA...

How to get the SAN environment information and statistics on AIX, HP-UX, Linux, Solaris, and Windows

How to get the SAN environment information and statistics on AIX, HP-UX, Linux, Solaris, and Windows Description NetBackup SAN Client is supported on the Linux , Solaris, Windows, HP-UX and AIX operating systems.  These environments provide the initiator device driver which can login to the SAN client media server and mount an pseudo   target device “ARCHIVE PYTHON” so that the backup or restore can be use the fiber transport (FT).  If there is an issue in the SAN environment, it is necessary to get the information/statistics from the SAN fabric for analysis.  The commands below can be used, on the respective operating system, to gather the necessary information. If the outputs show many or steadily increasing error counts, that indicates one or more issues with  the fabric  infrastructure. The issue(s) can be caused by cabling, SFP, san switch, DWDM, HBA or ISL and those components will need to be analyzed and...