Implementation

BProc consists of four basic pieces. On the master node, there are "ghost processes" which are place holders in the process tree that represent remote processes. There is also the master daemon which is the message router for the system and is also the piece which maintains state information about which processes exist where. On the slave nodes there is process ID masquerading which is a system of lying to processes there so that they appear (to themselves) to be in the master's process space. There is also a simple daemon on the slave side which is mostly just a message pipe between the slave's kernel and the network.

Ghost Processes

Code reuse is good. BProc tries to recycle of as much of the kernel's existing process infrastructure as possible. The UNIX process model is well thought out and certainly well understood. All the details of the UNIX model have been hammered out and it works well. Rather than try and change or simplify it for BProc, BProc tries to keep it entirely. Rather than creating some new kind of semi-bogus process tree, BProc uses the existing tree and fills the places which represent remote processes with light weight "ghost" processes.

Ghost processes are normal processes except that they lack a memory space and open files. They resemble kernel threads like kswapd and kflushd. It is possible for ghosts to wake up and run on the front end. They have their own status (i.e. sleeping, running) which is independent of the remote processes they represent. Most of the time, however, they sleep and wait for the remote process to request one of the few operations which are performed on their behalf.

Ghost processes mirror portions of the status of the remote process. The status include information such as the process state and the amount of CPU time that it has used so far. This aternate status is what gets presented to user space in the procfs filesystem. This status gets updated on demand (via a request to the real process) and no more often than every 5 seconds.

Ghosts catch and forward signals to the remote process. Since ghosts are kernel threads (not running in user space), they can catch and forward SIGKILL and SIGSTOP. There is no way to get rid of ghost process without the remote process exiting.

Ghosts perform certain operations on behalf of the real processes they represent. In particular they do fork() and wait(). If a process on a remote machine decides to fork, a new process ID must be allocated for it in the master's process space. Also, we should see a new ghost on the front end when the remote process forks. Having the ghost call fork() accomplishes both of these nicely. Likewise, the ghost process will also clean up the process tree on the front end by performing wait()s when necessary.

Finally, the ghost will exit() with the appropriate status when the remote process it represents exits. Since the ghost is a kernel thread, it can accurately reflect the exit status of the remote process including states such as killed by a signal and core dumped.

Process ID Masquerading

The slave nodes accept pieces of the master's process space. The problem here is although a process might move to a different machine, it should not appear (to that process) that it's left the process space of the front end. That means things like the process ID can't change and system calls like kill() should function as if the process was still on the front end. That is we shouldn't be able to send signals across process spaces to the other processes on the slave node.

Since the slave doesn't control the process space of the processes it's accepting, not all operations can be handled entirely locally either. fork() is a good example.

The solution that BProc uses is to ignore the process ID that a process gets when it's created on the slave side. BProc attaches a second process ID to the process and modifies the process ID related system calls to essentially lie to the process about what its ID is.

Having this extra tag also allows the slave daemon to differentiate the process from the other processes on the system when performing process ID related system calls.

The Daemons

The master and slave daemons are the glue connecting the ghosts and the real processes together.

Design Principles

BProc's design is based on the following basic principles.

Code resuse is good

BProc uses place holders called ghosts in the normal UNIX process tree on the front end to represent remote processes. The parent child relationships are a no-brainer that way and so is handling signals, wait, etc.

Code reuse is really really good.

Code reuse is even more important in user space since things seem to change so regularly. To avoid having to write our own set of process viewing utilities like ps and top. BProc presents all the information about remote processes in the procfs file system just like the system does for normal processes. As long as we keep up with changes in the procfs file system, all existing and future process viewing/control utilities will continue work for all time.

This is especially important in user space since user space programs seem to change very very often.

The System must be bullet proof! (from user space)

Processes can't escape or confuse the management system. Ghosts need to properly forward all signals including SIGKILL and SIGSTOP. There is no way for a ghost to exit without the process it represents also exiting.

Kernels shouldn't talk on the network.

The kernel is a very very bad place to screw up. Try and keep as much as possible outside of kernel space. This includes message routing and all the information about the current state of the machine.

Minimum knowledge

If a piece of the system doesn't really need to know something don't let it know. The master daemon is the only piece that knows where the processes actually exist. The kernel layers only have a notion of processes that are here or not here. Slaves don't know what node number they are.