Beginning a new Advanced Namespace Tools development blog

31 May 2016

The purpose of this blog is document in-depth, for myself and others, the process of developing and using Plan 9 Advanced Namespace Tools (ANTS). As the blog begins, ANTS as a collection is over 3 years old, with many of its components originating in 2009-2010. I will be working through all of the tools, explaining how they are used and fit together, and documenting the process of testing and improving them.

To make the blog accessible and to clarify my own thinking, I will target my writing for an audience that is familiar with coding and the standard unix shell environment, but does not necessarily know much about Plan 9. This first post will provide an overview of the original Plan 9 design for networked, distributed systems, and how ANTS builds on and modifies it.

The original Plan 9 distributed design

A network of Plan 9 systems is divided into 3 main roles: file servers, cpu servers, and terminals. The file server provides data storage, the cpu server provides the execution environment where programs run, and the terminal provides the user interface. The system resources are all presented via filesystem operations and can be shared across the network using the 9p protocol.

Plan 9 allows each process to have its own view of the file namespace, and supports union-binding multiple paths to a single directory. In combination with network transparency and the consistent use of the file abstraction, systems of networked machines have tremendous flexibility in how they share resources and present them to the user.

In practice, there are some obstacles to reaching the Data Paradise. A traditional Plan 9 from Bell Labs setup makes use of several components with sequential dependencies. First, you need the Venti archival data server, which does not provide a filesystem directly, but rather a deduplicated block store upon which filesystems can build. The fileserver itself is called Fossil. There is also an authentication system where a key-agent program called 'factotum' communicates with an authentication server to receive authorization to access a machine.

So: first, boot up a server which provides Venti service and lets make it the Auth server too. Now, we can boot up a Fossil server reading from the Venti. Next, we can boot a CPU server which will get permission from the Auth server to access the Fossil. At this point, we are finally ready to boot our user terminal, which will dial the CPU and Auth server and create our user environment.

How the Advanced Namespace Tools change things

This chain of networked sequential dependencies is challenging to configure and administer, and is fragile to network disruptions. The assumption was that the file/cpu/auth servers would all be centralized and professionally administered. Plan 9 never took off enough for many large organizations to be interested in setting up Plan 9 server data centers.

My experience in the context of home and hobby use was that the system was too 'brittle' - a networking glitch, or the need to reboot one system, meant that the entire collection of systems would freeze into an unresponsive state. The ability to control the machines depended on the user environment not breaking, and my collection of old hardware and virtual machines was not sufficiently reliable.

The solution I came up with to leverage the power of independent namespacing to make the grid more reliable. Each ANTS node has an independent "service namespace" which is created at boot and is not dependent on other nodes. Then, the user environment namespace can be built in typical distributed fashion. This separates the concerns of the user environment (built from a distributed collection of resources) and the stability of each node (each node can boot and be managed independently).