BProc Design and Implementation

The Beowulf Distributed Process Space (BProc) is set of kernel modifications, utilities and libraries which allow a user to start processes on other machines in a Beowulf-style cluster. Remote processes started with this mechanism appear in the process table of the front end machine in a cluster. This allows remote process management using the normal UNIX process control facilities. Signals are transparently forwarded to remote processes and exit status is received using the usual wait() mechanisms.

Motivation

rsh and rlogin are a lousy way to interact with the machines in a cluster. Being able to log into any machine in the cluster instantly necessitates a large amount of software and configuration be present on the machine. You will need things like shells for people to log in. You will need an up to date password database. You'll need all the little programs that people expect to see on a UNIX system for people to be comfortable using the system. You'll probablyl also need all the setup scripts and associated configuration information to get the machines up to the point where they're actually usable by the users. That sucks. There's an awful lot of configuration there. With a large number of machines, it's also very easy for the users to make a mess. Runaway processes are a problem.

The goal of BProc is to change to model of the cluster from a pile of PCs to single machine with a collection of network attached compute resources. And, of course, to do away with rsh and rlogin in the cluster environment.

Once we do away with the interactive logins, we get two basic needs. We need a way to start processes on remote machines and most importantly, we need a way to monitor and control what's going on the remote machines.

BProc provides process migration mechanisms which allow a process to place copies of itself on remote machines via a remote fork system call. When creating remote processes via this mechanism, the child processes are all visible in the front end's process tree.

The central idea in BProc is the idea of a distributed process ID (PID) space. Every instance of Linux has a process space - a pool of process IDs and a process tree. BProc takes the process space of the front end machine and allows portions of it to exist on the other machines in the cluster. The machine distributing pieces of its process space is the master machine and the machines accepting pieces of it to run are the slave machines.