This is an attempt to create a userland semantics for a roughly UNIX-like filesystem that nonetheless is a directed graph rather than a simple tree.
It is important to note at this stage that filenames in the UNIX filesystem are in fact not node names at all, but edge names. Files are uniquely identified by a node number on disc; and by that, every instance of a file in the filesystem has the same node number. However, every instance of a file in the filesystem can have a different name; see the following example:
~$ mkdir a b
~$ touch a/dugong
~$ ln a/dugong b/walrus
dugong and walrus are now two names for the same on-disc file; in other words, this part of the filesystem now looks like:
cd command to change directory is to traverse the named edge in the directory tree.
Early versions of UNIX did in fact have a directory graph rather than a directory tree (see Dennis Ritchie: The Evolution of the Unix Time-sharing System); this did not last very long, because it violated a principle herein called (in honour of Tom Stoppard) the Elsinore principle. It made much greater use of hard links than is customary in more recent iterations; notably because to change to a directory that 'contained' the current directory required significant contortions, which made the resulting filesystem considerably more difficult to use and incomprehensible than the tree-based version which succeeded it.
This leads to the statement of the Elsinore principle: that "every entrance is an exit somewhere else" - or, in other words, that doors or edges do not disappear once you have gone through them.
It is trivial to see that the tree-based UNIX filesystem adheres to this principle. The subdirectory relation defines two links between the directories involved, the named link, pointing from the parent to its subdirectory, and another called '..' pointing from the subdirectory to the parent. As soon as one of these edges is traversed, the 'door' remains visible in the form of the other. The UNIX filesystem is a tree so that users can backtrack through the path from which they came. How can this be done in the graph case?
A naïve approach which makes a lot of intuitive sense says to reserve a set of filenames (much as '.' and '..' are reserved), one for each parent. A sensible choice would be filenames beginning with the
^ character. If, in the above example,
dugong were a directory rather than a file, its list might look somewhat like:
dugong$ ls -a
. .. ^a ^b
This falls foul of the nature of the filesystem, however: in the UNIX filesystem, edges are labelled, and the parent directories have no inherent name. If the filesystem is a graph, any given directory could have any number of names. This would evidently be extremely confusing for the user, which leads us to another principle: people generally want to know where they are with more urgency than how they got there, and where they are going to be with more urgency than how they are going to get there.
This leads us to what will be referred to here as the Castle principle: name rooms in preference to doors. Name doors as well, if you so desire, but name rooms primarily.
In terms of physical spaces, this seems moderately obvious; going from an unknown place through a door to another unknown place, only to turn around and find that all the doors have been relabelled would be a deeply disorientating experience, and pity the poor user who steps through 'The ill-fated door of Sir Boris the Vertiginous' into a teetering mass of turretry, the only exit from which is directly downwards.
In software, where there are no 'physical' direction cues, this problem is exacerbated.
It should be noted that it is potentially useful to label doors; and so perhaps a 'relabel' command would be a useful thing to have.
Another point: whither '..'? Need we sacrifice this very useful convention?
To a great extent, this convention can be reinstated by storing the full path through which the current directory was reached as the current directory rather than just the node number of the current directory; this acts as a directory stack, where using 'cd' with an absolute path loads the stack from the pathname, 'cd' with a relative path pushes all elements of that path onto the stack, and .. refers to the directory reached through all nodes in the path except the very last. It is unclear how heavy a demand this would place on the computational or storage capacities of the computer, but it seems feasible.
This is only possible if the system contains the possibility of absolute paths; it can be argued (and has been by Jimmy) that if the filesystem is a directed graph, there is no need to have a 'root' directory, and rather base each user in their home directory. However, while backed by sound spatial reasoning (if the home directory of the user is identified with their desktop, then the 'visual' root of the screen corresponds to the conceptual root of the filesystem) this falls foul of the Castle principle, as places on the computer no longer have place names that are independent of the viewpoint of the user using it. In addition, the operating system needs to know where to find its own system files. It thus does make a good deal of sense to have a single 'root' point from which all edges are outgoing. Even the archaic UNIX graph filesystem had a similar concept to this; although it had no pathnames, every directory had a special 'dd' directory which linked to the directory of home directories.
That being said, why make a distinction between outgoing and incoming edges to a directory? The answer to this one is quite simple from a user's perspective; if all edges are equal then all recursive operations will eventually spread out to fill the entire filesystem, making most kinds of operations on directories at worst extremely space-hungry and marginally useless and at worst actively dangerous.
Some problems with this approach:
- One clearly cannot expect users to give every file and directory on their system an entirely unique name. Not only would this provoke outright user rebellion, it would also render operations like 'copy directory' entirely useless. How can situations be dealt with where a directory is 'in' two directories which have the same name, such as the not unreasonable case where a directory called 'html' is within two different directories called 'www'? A bruteforce approach would be to call the first-created uplink
^wwwand the second
^www2, but that seems ugly and just pushes the problem further along; one cannot simply label each non-unique parent with its own parent, as it may have multiple parents.
- How does this interact with a spatial browsing approach? All the above is couched in language strongly influenced by the navigational-browsing metaphor enforced by a terminal-centric interaction approach. While the act of moving to a subdirectory is conceptually modelled by the user as moving from one room to another then these assumptions hold; even if a subset-like containment approach is taken, then these assumptions may hold, as any given subset can be a subset of multiple supersets; but what of the conceptual model where subdirectories are truly 'within' each other, nested "like Russian Dolls"? This does not seem too far distant from the subset problem, if you view directory boundaries as permeable, but requires more thought.