A Commentary on the Sixth Edition Unix Operating System (1977)

One of my favourite books, I even read it a couple of times, even hacked around with xv6 (the x86 port of the edition 6 kernel[0][1], if you do hack around with it in a VM make sure to add HLT to the idle() function, if nothing else it will save your fans).

One of a small number of books (such as TCP/IP Illustrated[2]) that progressed me from the larval hacker stage.

I also met Lions when I was a kid, but didn't put 2 + 2 together until 20 years later!

[0] https://github.com/bringhurst/xv6 [1] https://pdos.csail.mit.edu/6.828/2011/xv6.html [2] https://www.r-5.org/files/books/computers/internals/net/Rich...

For Linux, one of my top 5 favourite books of all time is “Linux Core Kernel”, which is a Linux version of Lions but on pages so big it doesn’t fit on the shelf.

About 50 years ago. This is actually kind of cool.

It's part of history. I also like the UNIX philosophy still, even though I would say Linux is not 1:1 using the same philosophy these days.

My all-time favourite showcasing of UNIX is Brian Kernighan showing how pipes work:

https://www.youtube.com/watch?v=tc4ROCJYbm0

Nowadays I feel pipes are, while still very useful, not quite as powerful, in part because computer systems themselves became more powerful and a lot of the software stack out there has "batteries included". For instance, many of the things that are done via pipes, I may do via ruby or python, even if it may not be as efficient (I also use pipes of course, but my point is more that today pipes aren't quite as useful as they once used to be, even if they still have a use case).

I'd argue pipes are even more powerful with multi-core machines. You get parallelism for absolutely free. No thread pools to manage, no async, no channels, no new abstractions. Just the same pipe that works in the same way it did 50 years ago.

But because each step of the pipeline is a separate process, you get the N executables of the pipeline all able to run on separate cores.

I am not trying to say that every workload will benefit from this, or that such coarse parallelism is optimal for all use cases. But the fact that it is free with no changes to the pipe, the pipeline, or any of the executables is incredible.

I agree. Pipes meant you could do things on a PDP-11 that a 16-bit computer couldn't normally come close to doing. I wasn't using the things at the time, but I've looked at competing 16-bit operating systems like Nova RDOS, RT-11, RSX-11, and MS-DOS (which faked pipes badly with unbuffered temporary files), and the difference from Unix was night and day, even before csh added command-line completion. (Smalltalk and Cedar may have been exceptions!)

Unix was such a nice programming environment that people at AT&T started using PDP-11 Unix as their development environment for mainframe software, for example for 36-bit Honeywell machines and IIRC 32-bit IBM 370s as well. Like how you run Arduino on your laptop to program a microcontroller, but the mainframe was the microcontroller.

I think that, once Perl and cheap VAXen roamed the land, Perl became a competitive replacement for a fair bit of the sort | join | cut stuff. Still, I use shell pipelines today pretty often. Here are some recent examples:

  grep toread | shuf -n 8
  awk '!/:$/ && !/^ *\./ {print $1}' | sort | uniq -c
  cat ../rfc-corpus/*/rfc*.txt |time ./justhash | head
  ls -lartc | tail -20
  dpkg -l 'lib*' | grep '^i' | shuf -n 8
  find -name '*test*c' -print0|xargs -0 grep malloc|shuf
  locate -ib eclipse | grep -i license
  ./sweetdreams-c++ | play -t raw -r 48000 -e signed-integer -b 16 -L -c 2 -
  locate -ib infinite|grep -i jest
  git log| grep Author: | uniq -c
  perl -lne 'print $1 if /^(\w+)/' tmp.hanoi.strace | sort | uniq -c | sort -n
  bzip2 -9c < ./hanoi.lua | wc -c
  nc -v -v -l | tar xvf -
  find | shuf -n 8 | xargs wc

(You can probably deduce from this that I'm old.)

You could imagine an interactive command prompt where it was just as convenient to do all that stuff without spawning off separate processes with pipes between them. But there are a few major ways that most alternatives fall short for things like this, and they're much shallower than the architectural stuff others have mentioned. Basically programming languages and command languages have conflicting requirements:

1. You need to be able to type stuff in one place to add functionality. Take the tmp.hanoi.strace line; the line preceding it in my history was perl -lne 'print $1 if /^(\w+)/' tmp.hanoi.strace, which I may or may not have ^C'ed out of halfway through. Suppose we've encapsulated that command into some lengthy Python expression:

    foo(bar, baz, quux, grault(toto, zozo))

and now we want to add the | sort | uniq -c | sort -n, which in Python is called collections.Counter. You have to insert "collections.Counter(" at the beginning of the line, and then go to the end of the line, and insert ")", to get

    collections.Counter(foo(bar, baz, quux, grault(toto, zozo)))

which is actually a very large amount of friction in this context. This sounds dumb but it really matters a lot to iteration velocity.

Also, you have to do it twice because you forgot to import collections. In the shell, the stuff in your path is always in your path, unless you forgot to install it on the computer you're using.

2. Pipes do lazy evaluation. Some amount of lazy evaluation is probably important for that kind of interactive incremental buildup, because you don't really need to see all 7728 lines of tmp.hanoi.strace before you know that your regexp is doing the right thing. For example in Python you could say

    [mo.group(1) for line in open('tmp.hanoi.strace') for mo in [re.search(r'^(\w+)', line)] if mo]

but it tries to print out the whole thing on one line, after evaluating the entire list. You can make it lazy with a genex, but ergonomically that's even worse:

    >>> (mo.group(1) for line in open('tmp.hanoi.strace') for mo in [re.search(r'^(\w+)', line)] if mo)
    <generator object <genexpr> at 0x7fd2ac06ba60>

Reading through this 7000-line file is fast enough to not worry about, but, for example, generating the list of 420,011 filenames in firefox-esr-140.4.0esr incurs a noticeable delay:

  [fname for root, dirs, files in os.walk('.') for fname in [f'{root}/{basename}' for basename in files]]

You can do something like this:

  >>> for fname, line in ((fname, line) for root, dirs, files in os.walk('.') for fname in [f'{root}/{basename}' for basename in files] if re.search('test.*c$', fname) for line in open(fname) if 'malloc' in line): print(fname, line.rstrip())

which sort of works, doing the same thing as my xargs -0 example, until it crashes because some source file contains invalid UTF-8. (But that's a different Python braindamage problem, mostly unrelated to the issues I'm talking about.)

3. Less finger typing is really, really important. The point-free nature of the shell helps here; it's grep Author:, not grep Author: line > authors. And, as Yosef Kreinin famously pointed out in https://yosefk.com/blog/i-cant-believe-im-praising-tcl.html, definitely not authors = grep("Author:", line). Similarly for echo -n foobar vs. print(foobar, end=""). Admittedly this would be more convincing if my tmp.hanoi.strace argument had used grep -Po '^\w+' tmp.hanoi.strace instead of a Perl one-liner, but I guess I didn't think of that at the time!

But you can see from the above Python that the shell is very significantly terser, quite aside from requiring less punctuation, and I think that's generally the case. In my view, this point-free nature makes shell scripts (and Forth, and PostScript) significantly harder to read, but for a command language, it matters more how easy it is to write.

4. As Kreinin points out, command languages should privilege literal strings and numbers rather than variables and expressions, because if you have an expression you use often, you're going to put it into an alias or function or macro or whatever. I mean we eventually got head instead of sed 10q because people did it so often.

5. A command language is a user interface. User interfaces require feedback on the state of things, the current situation. Python has no equivalent to ls -lart for variables and functions, its dir() does exist but does not produce very usable output, and, although it does have tab-completion enabled in the default REPL for variable names and properties —which pleasantly enough even excludes _ names and produces tabular output— you can't tab-complete filenames.

  >>> dir(collections)
  ['ChainMap', 'Counter', 'OrderedDict', 'UserDict', 'UserList', 'UserString', '_Link', '_OrderedDictItemsView', '_OrderedDictKeysView', '_OrderedDictValuesView', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_chain', '_collections_abc', '_count_elements', '_eq', '_iskeyword', '_itemgetter', '_proxy', '_recursive_repr', '_repeat', '_starmap', '_sys', '_tuplegetter', 'abc', 'defaultdict', 'deque', 'namedtuple']
  >>> collections.
  collections.ChainMap(     collections.OrderedDict(  collections.UserList(     collections.abc           collections.deque(        
  collections.Counter(      collections.UserDict(     collections.UserString(   collections.defaultdict(  collections.namedtuple(   
  >>> collections.

(IPython does tab-complete filenames.)

AFAIK irb doesn't even have dir(). (https://stackoverflow.com/questions/468421/ruby-equivalent-o...)

6. Network effects: it's significantly easier to dpkg -l or git log in the shell than it is in Python. But this could go either way.

Anyway, I think it's possible to come up with a much better user interface for this kind of stuff, but nobody has done it yet. SIEUFERD/Ultorg (https://vimeo.com/173726371) shows a promising direction for relational database queries, and you could reformulate a lot of this stuff as relational database queries.

> cheap VAXen roamed the land

I'm stealing this

If you steal some VAXen, I want one.

Lines 175, 180, 185 have some extremely weird structs, that do not make sense in today's C language. Back in 6th Edition C, all struct fields were in the same namespace, so you could use '->lobyte' or '->hibyte' on any pointer you cared to. Good stuff, I'm sure it made for very interesting bugs.

Which also explains why very old APIs like struct timeval and struct stat have namespaced field names.

FWIW, some of us still do this in C programs today. Having a relatively unique prefix for struct members makes it extremely easy to find uses of those members with relatively simple tools like cscope.

Heh, tools. I just put an "XX" on the front of a field I want to refactor and see what breaks.

Can't cscope tell you everywhere a given struct type is used?