Append-Only Programming (2024)

The unison programming language (https://www.unison-lang.org/) follows a similar idea. Functions have immutable IDs and "modifying'" a function involves creating a new function and all the callers that need to be updated to use this new function they also in turn become a new function and this bubbles up to the top. All this is assisted via tooling.

The unison language ecosystem leverages that property to implement distributed computation where the code can be shipped to remote workers in a very fine grained way, but I guess this building block can be used for other ideas (I don't know, didn't quite put my mind into it but sounds very interesting)

Scrapscript does something similar. Really interesting!

https://scrapscript.org/

I wish more languages supported first class functioning versioning in this style instead of the _v2 convention people are typically forced to adopt.

Y'know, after working with gRPC for a while I've decided that I actually like explicitly putting the version in the name whenever you allow multiple versions to exist simultaneously.

The thing about making it explicit is that it means you can see what version is being used just by reading the code, and you can also find usages of a specific version with a simple text search. Without that, it's more likely that maintainers will get stuck relying on more complicated - and therefore brittle - tooling to manage the code.

Interesting approach. A bit similar to 'test && commit || revert' (TCR) as done by Kent Beck.

I kind of doing this with my AoC with my literary programming approach where I add only code to the markdown file that is then processed by the MarkDownC program [1], which takes all the C fragments in de markdown file and puts them in the right order to be compilable, overwriting earlier definitions of functions and variables. So, each markdown file, one per day [2], shows all the steps how I arrived at the solution. I do use a normal editor and use copy-and-paste a lot when making new versions of a certain function.

[1] https://github.com/FransFaase/IParse/?tab=readme-ov-file#mar...

[2] https://github.com/FransFaase/AdventOfCode2024/blob/main/Day...

Neat. I’ve done some kinda sorta similar stuff with org mode in emacs. I really like it, but it does just make me long for a system with built in tracking, automatic synthesis of current focus from programs in use, etc.

Imagine something like a Jupyter notebook, with embedded browser widget; you look something up and that info is embedded in notebook. Same with bits of source files, shell commands and output, etc.

Trying to remember how I did something like 6 months ago? Open that notebook and scroll from bottom.

I’m skeptical of llm-take-over-the-world narratives, but I think that also a document like this could be used by them to synthesize summaries, TOCs, etc.

One of these days I’ll get around to implementing this b kind of flow myself. I hope. Someday.

and you often discover, in the middle of writing your low-level functions, that your high-level functions need to be revised, which append-only programming makes difficult

On the other hand, it's not a problem if you start bottom-up, which is a natural style when writing in C; the low-level functions are at the top (and the standard headers included at the very top can be thought of as a sort of lowest-level), while main() is at the bottom.

Interesting. I wouldn't be able to keep that straight in my head - I wouldn't be able to predict all the low-level functions I needed until I'd built the higher-level functions. In fact I'd approach it exactly backwards - start at the top and work my way down. It matches perfectly with a Jackson Structured Programming diagram in my mind.

Build main() by decomposing it into calls to input(), process(), output(). At each step decompose the function you're working on, handing as much of its work as possible off to a child function you haven't yet written. Only question after that is whether you implement those functions depth-first or breadth-first (I'd prefer depth-first).

Maybe procedural programming is now so old it's novel again.

I find it's easy enough to adapt to - and switch between - either convention, as long as there is indeed a convention.

What really hurts readability for me is when the structure is a random mix of top-down and bottom-up. Which seems to be regrettably common in many object-oriented codebases.

I have a slightly different approach. Quoting my tweet:

"Once @cognition_labs AI Engineer Devin gets good enough, I will have it implement each feature as a series of Pull Requests: - One or more Refactoring PRs - modify structure of existing code but no behavioral change. - A final PR which is "append only" code - no structural change, only behavioral."

https://x.com/realsanketp/status/1879766736742092938

Nice as a thought experiment, but you actually do get this in real life as well, when maintaining a public API with a large user base (where even the details of the internal workings need to be frozen over time.)

Gives you lovely stuff like the Win32 API (introduced in 1993 and still very much a thing!). CreateWindow, CreateWindowEx, CreateWindowExEx (OK, I made that up...), structs with a load-bearing length field, etc. etc. And read some Raymond Chen on the abuse that customers inflict on the more 'private' stuff...

But you do get perfect backward compatibility. If you want to opt-in to the bug fix, you use a newer version of the function.

I recall there being a blog post, maybe by Joe Armstrong in which he advocated for having version numbers in functions. CreateWindow_v1 CreateWindow_v2, or you could use the commit hash.

I do think version numbers in function names is the least bad version of this. I seem to recall a demonstration of a content-addressable function calling system which could effectively garbage-collect no-longer-called function implementations, but that's only useful if you have the library and all its callers in the same execution context.

I was playing around with GPIO some time ago. The Linux kernel seems to adopt this approach. The first rule of Linux ABI is that thou shalt not break ABI.The second rule of Linux ABI is THOU SHALT NOT BREAK ABI.

So they had something like GPIOHANDLE_GET_LINE_VALUES_IOCTL, decided to change things around a little, so introduced GPIO_V2_LINE_GET_VALUES_IOCTL.

Although, as the saying goes, the problem with backwards compatibility is that anything that starts backwards stays backwards.

It also reminds me a little bit like DNA. There's a lot of cruft that doesn't do very much but the organism is still viable anyway.

This is how microservices should work and in practice do work, but the deployment teams have to coordinate running n+k versions simultaneously so make sure that no inflight requests and clients are broken, this is can take months making k very large.

Very interesting also as a thought-provoking idea. For example

  - It would be less challenging if function pointers variables are used instead of function. In this case, the code appended later may override the function variables it needs to fix/change

  - Since all the code is there, it is possible to invent some convention to compile/run previous versions without CVS machinery

I'm not sure how you would override the function pointers repeatedly without getting a compile error. Maybe you could instead use functions with GCC's weak attribute?

I guess they mean to overwrite the pointers:

  void (*foo)(void)
  int main(){
    *foo();
  }
  void foo1(){...}
  foo = &foo1; // will be overwritten later
  void foo2(){...}
  foo = &foo2;

I'm on my phone so I haven't compiled it, but that's the rough idea I got, define lots of versions and then set the version to the latest one.

(You can skip the address-of for functions by punning, but reading the types is hard enough already.)

That's not valid C; you can't put statements outside of functions, not even assignment statements or other expression statements. No C compiler I've ever used will accept it. Where would the compiled version of those statements go, in an init function? Linkers added support for those for C++ static initializers, where execution order isn't defined; C doesn't have anything similar. Are you expecting the compiler to execute the assignment statements at compile time, with a C interpreter?

(What's the language lawyer equivalent of an ambulance chaser?)

Fair point, I hadn't thought it all the way through. It's also all too easy to assume you can use C++ features or GNU extensions if you're not in the habit of enforcing the standard. For your trouble here are two partial solutions:

---

1. CPP abuse

If I was doing this myself I'd probably just use the preprocessor to make the last function my entry point like

  SOURCE=file.c  gcc -DLATEST_MAIN=$(grep -E "^[a-zA-Z_].*\(.*\)\s*\{" ${SOURCE} | tail -1 | grep -Po "[a-zA-Z_]\w*(?=\()") ${SOURCE}

and then you can start with

  int main(void){LATEST_MAIN();};

and append

  void foo(){...}
  ...
  void bar(){...}

and execution will start from the last defined function (that matches the regex).

---

2. Pre-defining non-static functions

If that's cheating then you can redefine "external" functions before the linker has a chance to provide them. For example:

  #include <stdlib.h>
  int (*foo)(void);
  int main(void){exit(0); *foo();}

is a valid program, and you can append

  void f1(){puts("oh");}
  int exit(){foo=&f1;}

and foo will indeed be called. It does require some forward planning if you want to append multiple times, but it shouldn't be so difficult to generate a library full of dummy functions to use.

This isn't a language-lawyer kind of thing, and, as I said, I don't think C++ features or GNU extensions will help. The pre-defining extern functions approach will get you a finite number of tries, but I suppose it could be a fairly large finite number.

If for some reason I had to solve this problem in real life I would do it with a postprocesing step like your #1 suggestion. But once you're putting code into your Makefile anything goes.

You can just append a function with __attribute__((constructor)) if GNU extensions are allowed and that will run before main and let you mess with the function pointers.

In an undefined order?

Objection! Badgering the witness!

I'm sure you know, but the undefined order is easily fixed by e.g. keeping a priority queue for your appended functions and then each time you add one you can include a constructor that enters it in the queue with a priority corresponding to its desired position.

And then main() pops the latest override off the priority queue? I guess you're right; GCC extensions do solve the problem. And you can do the same thing with static object constructors in C++, though language-lawyeringly I'm not sure the standard guarantees enough about static initialization order to give the first constructor a well-defined empty queue.

I see no issue in setting it dynamically from the code, supposed to be set during start-up code. A global variable representing a function pointer and this variable either changed several times (if the latest addition allows execution of previous routines then setting its new value) or just once for the latest version if the logic somehow disables previous start-up routines. Can be called "dynamic inheritance"

How do you change the global variable, though?

Just an aside: I really like the design of this blog: it's very clean and the text is black on white. It's kinda shocking how many websites disregard basic accessibility rules about contrast and make text light gray.

You might like my Neat CSS framework, which I use for everything these days. I didn’t use perfectly black, it’s off just a bit, which I find pleasing, but it still has plenty of contrast.

https://neat.joeldare.com

Gerald Sussman (of MIT/SICP fame) has written and spoken a lot about practical ways of achieving this kind of thing. The idea is when new features are added to software you only have to write new code, not change the existing code (and, likewise, removing features is only deleting code). Having a well-defined process that allowed this kind of software development in a practical way is the dream really.

Back in the days when having your own tape reel for storing code was a thing (an upgrade from punch-cards), we used to do this .. write the first version of the code, stream it to tape, enhance the code some more, produce a diff, write the diff to tape, and on and .. on and on .. such that, to restore a working copy of a codebase, we'd rewind the tape "sync;sync;sync" and then progressively apply every diff as it was loaded from the stream. Every few months or so, we'd rewind the tape and stream the updated code to the front of it, and repeat the process again .. it was kind of fun to think that the whole tape had a working history.

These days of course we just use git, but there was a day that we could see the progress of a codebase by watching the diffs as they streamed in off the reels ..

I have played around a lot with GW-BASIC (and pcbasic (pip install pcbasic)) lately and this strikes me as something that could be made to work very well with old BASIC variants using line-numbers, since when the interpreter sees a new line with the same number as an existing line it will overwrite the old line. Tried this in a text file:

   10 print "helo"
   20 print "world"
   10 print "hello"

Worked as expected in both GW-BASIC 1.0 and pcbasic, printing "hello\nworld". Listing the program after loading it only shows the modified line 10.

A bit awkward since the BASIC editor/REPL itself can not be used. It would work for writing BASIC using a regular text editor and then just running it with BASIC as an interpreter.

If you have a large body of existing code and you want to change its behaviour, you have to work out where to add your change, without breaking what already is there.

My thoughts in this is to somehow create a system where additional rules or changes to behaviour have marginal cost.

I am interested in the Rete algorithm, a rule engine algorithm. But we could run this sort of thing at compile time to wire up system architecture.

Boilerplate or configuration is an enormous part of programming and I feel there really could be more tools to transform software architecture.

So... it's literally the open-closed principle, in its straw-man form? Where you actually can't change the old code but can only write new?

Well, it's ridiculous. IMO, of course but... seriously. One of the greatest (and even joyful) things about being a software developer is that you can change old code. Literally go there, rewrite things, and end up with a new version of code (which is presumably better in some respect).

All bugs in the program automatically become features too!

I'm personally very sympathetic and interested in avant-garde, experimental software development methods like this. I understand that most devs reading this is mortified and doubt about my sanity, but I do unironically use extremely, stupidly limiting techniques like this. For example, in my current project (working on an automated theorem prover) I have this rule that development happens in epochs. I write code in file theorem_prover_v1.rs, I run some unittests, run some tests, take notes. Copy the file to theorem_prover_v2.rs and attempt to rewrite the whole thing based on the previous form. Every line is critically examined. As many lines are attempted to be modified as possible. Do we need this type? Is this mathematically sound? Can I really justify that this abstraction is necessary? [1] It's an extremely inefficient and slow process, but a lot of software engineers don't appreciate that development efficiency--although usually among the most important factors of the success of the project--it's not necessarily the only important factor for all projects, and it is worth experimenting with methods that are intentionally inefficient but has promise to improve something else. You don't like it, you can always go back to Agile or whatever else you do at your day job.

Art progresses with extreme restrictions. The same way Schoenberg put seemingly absurd restrictions in his music ((very roughly) don't repeat the same note before playing every other note etc...) to create something radically novel, we as software developers can do so to advance our art as well.

[1] This method is the anti-thesis of the common "never rewrite a working program" software development methodology. Here, the experiment is to see what happens if we always rewrite, and never add or modify, i.e. refactors are never allowed, instead if things need changing we need to re-design the whole thing top-bottom with the new understanding.

I believe this approach was pioneered in CSS.

David Harel, the creator of statecharts, also developed the Behavioral Programming [0] 'b-thread' model motivated by a similar vision for append-only programming - it has been discussed on HN previously e.g. https://news.ycombinator.com/item?id=42060215

[0] https://cacm.acm.org/research/behavioral-programming/

It's like having one hand tied and typing code with the other hand. The using just four fingers. Then three.

I've seen it in database schemas since so many colleagues treat ALTER TABLE as black magic. In one example, there is a crud app with projects and then later another table with one to one mapping to projects. Of course there are integrity problems and N+1s.

So they embraced column-oriented DBs before everyone else did?

What software has this person written that warrants giving a damn what they have to say about programming?

Anyone telling me to write software using `cat >> foo.c` better come with some receipts.

I thought it would be about LLMs doing coding because they often just start over and add more code rather than editing existing code

this is a fun exercise. but I saw it a lot of times in big companies where people are too afraid to make changes. where also the open-closed principle comes from. something I don't like anymore in practice. because one has to get away from the fear of breaking things to maintain clean code, and clean architecture.

This is a great paradigm to pair with a suicide linux dev env.

I wondered if this might be a joke about ethereum

people unironically thinking this is a good idea

>And it produces source code that is eminently readable, because the text of the program recapitulates your train of thought – a kind of stream-of-consciousness literate programming.

Sorry, but what? This does not make any sense.

log coding

like LLMs

very cool

[deleted]