Advanced Namespace Tools blog

7 January 2017

New Boot Option: Cpu Server Root with Aan

Context and Motivation

The traditional method of booting a Plan 9 system from a filesystem across the network is "tcp boot" - this means that the kernel is loaded from local storage (in contrast to PXE boot) and then dials a 9p fileserver which provides the root filesystem. This is an authenticated but non-encrypted connection (the 90s were a much more innocent era on the net.) 9front has added a new boot option, "tls boot" - this is basically the same as tcp boot, but with the connection tunneled through encryption.

A long-standing issue with using a network mount for your root filesystem is the possibility that a temporary network disruption will cause your root to be unreachable, rendering the whole system unusable and forcing a reboot. This was in fact one of the major motivations for the ANTS independent boot namespace - to provide a stable environment with no dependence on the root fileserver, so you could continue to use a system which becomes disconnected from a remote root, or needs to stop its local disk fileserver, and repair/restart the userspace environment without rebooting.

Another tool available to help avoid problems with unrelaible network connections is aan, a program which provides "Always Available Network" service to clients. Aan proxies communications, and if the connection goes bad, periodically redials the server and reestablishes it without disrupting the application. This service is, however, not currently available as a flag-selectable option for any programs except import and cpu. Tcp/tls boot use srv, srvtls, and mount to establish the fileserver connection. An irc discussion with mveety reminded me that I had already experimented with taking a remote root from a cpu server exportfs, and I could add a boot option to the ANTS plan9rc script to make an aan-wrapped import of a root filesystem.

Adding Aan boot to Plan9rc

Adding this support was easy, because the basic logic is identical to tcp/tls boot. Here is "case aan" from the dogetrootfs fn:

case aan
	if(! ~ $ipdone yes){
		ipsetup
	}
	if(~ $fs ''){
		echo 'fs is [ ' $fs ' ]?'
		getans fs $fs
	}
	if(~ $auth '')
		auth=$fs
	if(~ $factotum terminal)
		/boot/factotum -a $auth
	echo srv $fs to /srv/boot...
	if(~ $fs tcp*)
		rimport -p -s boot $fs / /n/bootimp
	if not
		rimport -p -s boot tcp!$fs!17019 / /n/bootimp
	bootsrv=/srv/boot

This is identical save for the two "rimport" command lines to the tcp and tls logic. The only other things that need to be done were minor additions to the options parsing logic:

echo -n 'root is from (tcp,tls,aan,local)['$bootargs']: '
bootanswer=`{read}
if(~ $bootanswer rc){
	rc -i
}
if not if(! ~ $bootanswer ''){
	bootparse=`{echo $bootanswer}
}
if not{
	bootparse=`{echo $bootargs}
}
if(~ $bootparse(1) aan*){
	getrootfs=aan
[...]
if(~ $getrootfs tcp || ~ $getrootfs tls || ~ $getrootfs aan){
	if (~ $fs ''){
		echo 'fileserver is [ ' $fs ' ]?'
		getans fs $fs
	}
	if (~ $auth ''){
		auth=$fs
		echo 'auth is [ ' $auth ' ]?'
		getans auth $auth
	}
}

To make these options usable in boot, the frontmods/boot/bootfs.proto file was adjusted to include aan and the basename utility used by the rconnect helper script. With all of this done, I tested booting via aan from a cpu server, and everything worked well. The output of "ls /" looks slightly unusual, with quite a few "doubled" directories, because unlike a disk only fileserver, the synthetic files such as /proc are present in a cpuserver root exportfs. This does not cause issues, because they are "behind" the local kernel's files in the union, so you don't get any strange results like seeing two servers' worth of processes in ps -a.