Thursday 30th August, 2007
Thoughts on user-land semantics for directed-graph filesystems: The filing systems associated with most modern operating systems are tree-like in nature: there exists a root directory which contains subdirectories, which contain subdirectories below them, and so forth (for the purposes of this writing, files do not exist; as they do not play an active part in the directory tree, they can be largely ignored).

This is an attempt to create a userland semantics for a roughly UNIX-like filesystem that nonetheless is a directed graph rather than a simple tree.

It is important to note at this stage that filenames in the UNIX filesystem are in fact not node names at all, but edge names. Files are uniquely identified by a node number on disc; and by that, every instance of a file in the filesystem has the same node number. However, every instance of a file in the filesystem can have a different name; see the following example:

~$ mkdir a b
~$ touch a/dugong
~$ ln a/dugong b/walrus

dugong and walrus are now two names for the same on-disc file; in other words, this part of the filesystem now looks like:

Fig. 1

Using the cd command to change directory is to traverse the named edge in the directory tree.

Early versions of UNIX did in fact have a directory graph rather than a directory tree (see Dennis Ritchie: The Evolution of the Unix Time-sharing System); this did not last very long, because it violated a principle herein called (in honour of Tom Stoppard) the Elsinore principle. It made much greater use of hard links than is customary in more recent iterations; notably because to change to a directory that 'contained' the current directory required significant contortions, which made the resulting filesystem considerably more difficult to use and incomprehensible than the tree-based version which succeeded it.

This leads to the statement of the Elsinore principle: that "every entrance is an exit somewhere else" - or, in other words, that doors or edges do not disappear once you have gone through them.

It is trivial to see that the tree-based UNIX filesystem adheres to this principle. The subdirectory relation defines two links between the directories involved, the named link, pointing from the parent to its subdirectory, and another called '..' pointing from the subdirectory to the parent. As soon as one of these edges is traversed, the 'door' remains visible in the form of the other. The UNIX filesystem is a tree so that users can backtrack through the path from which they came. How can this be done in the graph case?

A naïve approach which makes a lot of intuitive sense says to reserve a set of filenames (much as '.' and '..' are reserved), one for each parent. A sensible choice would be filenames beginning with the ^ character. If, in the above example, dugong were a directory rather than a file, its list might look somewhat like:

dugong$ ls -a
. .. ^a ^b

This falls foul of the nature of the filesystem, however: in the UNIX filesystem, edges are labelled, and the parent directories have no inherent name. If the filesystem is a graph, any given directory could have any number of names. This would evidently be extremely confusing for the user, which leads us to another principle: people generally want to know where they are with more urgency than how they got there, and where they are going to be with more urgency than how they are going to get there.

This leads us to what will be referred to here as the Castle principle: name rooms in preference to doors. Name doors as well, if you so desire, but name rooms primarily.

In terms of physical spaces, this seems moderately obvious; going from an unknown place through a door to another unknown place, only to turn around and find that all the doors have been relabelled would be a deeply disorientating experience, and pity the poor user who steps through 'The ill-fated door of Sir Boris the Vertiginous' into a teetering mass of turretry, the only exit from which is directly downwards.

In software, where there are no 'physical' direction cues, this problem is exacerbated.

It should be noted that it is potentially useful to label doors; and so perhaps a 'relabel' command would be a useful thing to have.

Another point: whither '..'? Need we sacrifice this very useful convention?

To a great extent, this convention can be reinstated by storing the full path through which the current directory was reached as the current directory rather than just the node number of the current directory; this acts as a directory stack, where using 'cd' with an absolute path loads the stack from the pathname, 'cd' with a relative path pushes all elements of that path onto the stack, and .. refers to the directory reached through all nodes in the path except the very last. It is unclear how heavy a demand this would place on the computational or storage capacities of the computer, but it seems feasible.

This is only possible if the system contains the possibility of absolute paths; it can be argued (and has been by Jimmy) that if the filesystem is a directed graph, there is no need to have a 'root' directory, and rather base each user in their home directory. However, while backed by sound spatial reasoning (if the home directory of the user is identified with their desktop, then the 'visual' root of the screen corresponds to the conceptual root of the filesystem) this falls foul of the Castle principle, as places on the computer no longer have place names that are independent of the viewpoint of the user using it. In addition, the operating system needs to know where to find its own system files. It thus does make a good deal of sense to have a single 'root' point from which all edges are outgoing. Even the archaic UNIX graph filesystem had a similar concept to this; although it had no pathnames, every directory had a special 'dd' directory which linked to the directory of home directories.

That being said, why make a distinction between outgoing and incoming edges to a directory? The answer to this one is quite simple from a user's perspective; if all edges are equal then all recursive operations will eventually spread out to fill the entire filesystem, making most kinds of operations on directories at worst extremely space-hungry and marginally useless and at worst actively dangerous.

Some problems with this approach:

posted by Rob Mitchelmore, 04:08 (anchor)
Saturday 25th August, 2007
Some notes on Schenck's paintings 'Anguish' and 'The Orphan'.
posted by Rob Mitchelmore, 20:50 (anchor)
June 2015May 2015April 2015June 2014
January 2014November 2013October 2013July 2013
April 2013March 2013January 2013November 2012
older posts