GCC SC approves inclusion of Algol 68 Front End

In my mind this highlights something I've been thinking about, the differences between FOSS influenced by corporate needs vs FOSS driven by the hacker community.

FOSS driven by hackers is about increasing and maintaining support (old and new hardware, languages etc..) while FOSS influenced by corporate needs is about standardizing around 'blessed' platforms like is happening in Linux distributions with adoption of Rust (architectures unsupported by Rust lose support).

> while FOSS influenced by corporate needs is about standardizing around 'blessed' platforms like is happening in Linux distributions with adoption of Rust

Rust's target tier support policies aren't based on "corporate needs". They're based, primarily, on having people willing to do the work to support the target on an ongoing basis, and provide the logistics needed to make sure it works.

The main difference, I would say, is that many projects essentially provide the equivalent of Rust's "tier 3" ("the code is there, it might even work") without documenting it as such.

The issue is that certain specific parts of the industry currently pour in a lot of money into the Rust ecosystem, but selectively only where they need it.

How is that different than scratching one’s own itch?

Personal itches are more varied and strange than corporate itches. What companies are willing to pour time (money) into is constrained by market forces. The constraints on the efforts of independent hackers are different.

Both sets of constraints produce patterns and gaps. UX and documentation are commonly cited gaps for volunteer programming efforts, for example.

But I think it's true that corporate funding has its own gaps and other distinctive tendencies.

The Rust Community is working on gcc-rs for this very reason.

gcc-rs is far from being usable. If you want to use Rust with gcc-only targets you're probably better off with rustc_codegen_gcc instead.

The big difference is that Algol 68 is set in stone. This is what allows a single dedicated person to write the initial code and for it to keep working essentially forever with only minor changes. The Rust frontend will inevitably become obsolete without active development.

Algol 68 isn’t any more useful than obsolete Rust, however.

The core Algol 68 language is indeed set in stone.

But we are carefully adding many GNU extensions to the language, as was explicitly allowed by the Revised Report:

  [RR page 52]
  "[...] a superlanguage of ALGOL 68 might be defined by additions to
   the syntax, semantics or standard-prelude, so as to improve
   efficiency or to permit the solution of problems not readily
   amenable to ALGOL 68."

The resulting language, which we call GNU Algol 68, is a strict super-language of Algol 68.

You can find the extensions currently implemented by GCC listed at https://algol68-lang.org/

It's funny, I have a different view. Corporates often need LT maintenance and support for weird old systems. The majority of global programming community often chases shiny new trends in their personal tinkering.

However I think there's the retro-computing, and other hobby niches that align with your hacker view. And certainly there's a bunch of corp enthusiasm for standardizing shiny things.

I think you both are partially right. In fact, the friction I see are where the industry relies on the open-source community for maintenance but then pushes through certain changes they think they need, even if this alienates part of the community.

I don’t know that that is fair.

A number of years ago I worked on a POWER9 GPU cluster. This was quite painful - Python had started moving to use wheels and so most projects had started to build these automatically in CI pipelines but pretty much none of these even supported ARM let alone POWER9 architecture. So you were on your own for pretty much anything that wasn’t Numpy. The reason for this of course is just that there was little demand and as a result even fewer people willing to support it.

You nailed it. I am in the process in my spare time to maintain old Win32 apps, that corporates and always-the-latest-and-greatest crowd has abandoned.

Most people don't care about our history, only what is shiny.

It is sad!

As a fan of Algol 68, I'm pretty excited for this.

For people who aren't familiar with the language, pretty much all modern languages are descended from Algol 60 or Algol 68. C descends from Algol 60, so pretty much every popular modern language derives from Algol in some way [1].

[1] https://ballingt.com/assets/prog_lang_poster.png

Yes, massively influential, but was it ever used or popular?, I always think of it as sort of the poster child for the danger of "design by committee".

Sure it's ideas spawned many of today's languages, But wasn't that because at the time nobody could afford to actually implement the spec. So we ended up with a ton of "algols buts" (like algol but can actually be implemented and runs on real hardware).

Yes, for example UK navy had a system developed in Algol 68 subset.

https://academic.oup.com/comjnl/article-abstract/22/2/114/42...

Used extensively on Burroughs mainframes.

Burroughs used an Algol60 derivative (not '68)

ESPOL initially, which evolved into NEWP.

I would argue C comes from Algol68 (structs, unions, pointers, a full type system etc, no call by name) rather than Algol60

C had 3 major sources, B (derived from BCPL, which had been derived from CPL, which had been derived from ALGOL 60), IBM PL/I and ALGOL 68.

Structs come from PL/I, not from ALGOL 68, together with the postfix operators "." and "->". The term "pointer" also comes from PL/I, the corresponding term in ALGOL 68 was "reference". The prefix operator "*" is a mistake peculiar to C, acknowledged later by the C language designers, it should have been a postfix operator, like in Euler and Pascal.

Examples of things that come from ALGOL 68 are unions (unfortunately C unions lack most useful features of the ALGOL 68 unions. which are implicitly tagged unions) and the combined operation-assignment operators, e.g. "+=" or "*=".

The Bourne shell scripting language, inherited by ksh, bash, zsh etc., also has many features taken from ALGOL 68.

The explicit "malloc" and "free" also come from PL/I. ALGOL 68 is normally implemented with a garbage collector.

C originally had =+ and =- (upto and including Unix V6) - they were ambiguous (a=-b means a= -b? or a = a-b?) and replaced by +=/-=

The original structs were pretty bad too - field names had their own address space and could sort of be used with any pointer which sort of allowed you to make tacky unions) we didn't get a real type system until the late 80s

ALGOL 68 had "=" for equality and ":=" for assignment, like ALGOL 60.

Therefore the operation with assignment operators were like "+:=".

The initial syntax of C was indeed weird and it was caused by the way how their original parser in their first C compiler happened to be written and rewritten, the later form of the assignment operators was closer to their source from ALGOL 68.

> it should have been a postfix operator, like in Euler and Pascal.

I never liked Pascal style Pointer^. As the postfix starts to get visually cumbersome with more than one layer of Indirection^^. Especially when combined with other postfix Operators^^.AndMethods. Or even just Operator^ := Assignment.

I also think it's the natural inverse of the "address-of" prefix operator. So we have "take the address of this value" and "look through the address to retreive the value."

The "natural inverse" relationship between "address-of" and indirect addressing is only partial.

You can apply the "*" operator as many times you want, but applying "address-of" twice is meaningless.

Moreover, in complex expressions it is common to mix the indirection operator with array indexing and with structure member selection, and all these 3 postfix operators can appear an unlimited number of times in an expression.

Writing such addressing expressions in C is extremely cumbersome, because they require a great number of parentheses levels and it is still difficult to see which is the order in which they are applied.

With a postfix indirection operator no parentheses are needed and all addressing operators are executed in the order in which they are written.

So it is beyond reasonable doubt that a prefix "*" is a mistake.

The only reason why they have chosen "*" as prefix in C, which they later regretted, was because it seemed easier to define the expressions "*++p" and "*p++" to have the desired order of evaluation.

There is no other use case where a prefix "*" simplifies anything and for the postfix and prefix increment and decrement it would have been possible to find other ways to avoid parentheses and even if they were used with parentheses that would still have been simpler than when you have to mix "*" with array indexing and with structure member selection. Moreover, the use of "++" and "--" with pointers was only a workaround for a dumb compiler, which could not determine by itself whether it should access an array using indices or pointers. Normally there should be no need to expose such an implementation detail in a high-level language, the compiler should choose the addressing modes that are optimal for the target CPU, not the programmer. On some CPUs, including the Intel/AMD CPUs, accessing arrays by incrementing pointers, like in the old C programs, is usually worse than accessing the arrays through indices (because on such CPUs the loop counter can be reused as an index register, regardless of the order in which the array is accessed, including for accessing multiple arrays, avoiding the use of extra registers and reducing the number of executed instructions).

With a postfix "*", the operator "->" would have been superfluous. It has been added to C only to avoid some of the most frequent cases when a prefix "*" leads to ugly syntax.

> You can apply the "*" operator as many times you want, but applying "address-of" twice is meaningless.

This is due to the nature of lvalue and rvalue expressions. You can only get an object where * is meaningful twice if you've applied & meaningfully twice before.

    int a = 42;
    int *b = &a;
    int **c = &b;

I've applied & twice. I merely had to negotiate with the language instead of the parser to do so.

> and all these 3 postfix operators can appear an unlimited number of times in an expression.

In those cases the operator is immediately followed by a non-operator token. I cannot meaningfully write a[][1], or b..field.

> The only reason why they have chosen "*" as prefix in C, which they later regretted, was because it seemed easier to define the expressions "++p" and "p++" to have the desired order of evaluation.

It not only seems easier it is easier. What you sacrifice is complications is defining function pointers. One is far more common than the other. I think they got it right.

> With a postfix "*", the operator "->" would have been superfluous.

Precisely the reason I dislike the Pascal**.Style. Go offers a better mechanism anyways. Just use "." and let the language work out what that means based on types.

I'm offering a subjective point of view. I don't like the way that looks or reads or mentally parses. I'm much happier to occasionally struggle with function pointers.

A postfix "*" would be completely redundant since you can just use p[0] . Instead of *p++ you'd have (p++)[0] - still quite workable.

> The only reason why they have chosen "" as prefix in C, which they later regretted, was because it seemed easier to define the expressions "++p" and "*p++" to have the desired order of evaluation.

There has been no shortage of speculation, much of it needlessly elaborate. The reality, however, appears far simpler – the prefix pointer notation had already been present in B and its predecessor, BCPL[0]. It was not invented anew, merely borrowed – or, more accurately, inherited.

The common lore often attributes this syntactic feature to the influence of the PDP-11 ISA. That claim, whilst not entirely baseless, is at best a partial truth. The PDP-11 did support pre-increment and post-increment indirect address manipulation – but notably lacked their symmetrical complements: pre-increment and post-decrement addressing modes[1]. In other words, it exhibited asymmetry – a gap that undermines the argument for direct PDP-11 ISA inheritance, i.e.

  MOV (Rn)+, Rm

  MOV @(Rn)+, Rm

  MOV -(Rn), Rm

  MOV @-(Rn), Rm

existed but not

  MOV +(Rn), Rm

  MOV @+(Rn), Rm

  MOV (Rn)-, Rm

  MOV @(Rn)-, Rm

[0] https://www.thinkage.ca/gcos/expl/b/manu/manu.html#Section6_...

[1] PDP-11 ISA allocates 3 bits for the addressing mode (register / Rn, indirect register (Rn), auto post-increment indirect / (Rn)+ , auto post-increment deferred / @(Rn)+, auto pre-decrement indirect / -(Rn), auto pre-increment deferred / @-(Rn), index / idx(Rn) and index deferred / @idx(Rn) ), and whether it was actually «let's choose these eight modes» or «we also wanted pre-increment and post-decrement but ran out of bits» is a matter of historical debate.

The prefix "*" and the increment/decrement operators have been indeed introduced in the B language (in 1969, before the launch of PDP-11 in 1970, but earlier computers had some autoincrement/autodecrement facilities, though not as complete as in the B language), where "*" has been made prefix for the reason that I have already explained.

The prefix "*" WAS NOT inherited from BCPL, it was purely a B invention due to Ken Thompson.

In BCPL, "*" was actually a postfix operator that was used for array indexing. It was not the operator for indirection.

In CPL, the predecessor of BCPL, there was no indirection operator, because indirection through a pointer was implicit, based on the type of the variable. Instead of an indirection operator, there were different kinds of assignment operators, to enable the assignment of a value to the pointer, instead of assigning to the variable pointed by the pointer, which was the default meaning.

BCPL has made many changes in the syntax of CPL, whose main reason was the necessity of adapting the language to the impoverished character set available on American computers, which lacked many of the characters that had been available in Europe before IBM and a few other US vendors have succeeded to replace the local vendors, also imposing thus the EBCDIC and later the ASCII character sets.

Several of the changes done between BCPL and B had the same kind of reason, i.e. they were needed to transition the language from an older character set to the then new ASCII character set. For instance the use of braces as block delimiters was prompted by their addition into ASCII, as they were not available in the previous character set.

The link that you have provided to a manual of the B language is not useful for historical discussions, as the manual is for a modernized version of B, which contains some features back-ported from C.

There is a manual of the B language dated 1972-01-07, which predates the C language, and which can be found on the Web. Even that version might have already included some changes from the original B language of 1969.

* was the usual infix multiplication operator in BCPL, and it was not used for pointer arithmetic.

The BCPL manual[0] explains the «monadic !» operator (section 2.11.3) as:

  2.11.3 MONADIC !

  The value or a monadic ! expression is the value of the storage cell whose address is the operand of the !. Thus @!E = !@E = E, (providing E is an expression of the class described in 2.11.2).

  Examples.

  !X := Y Stores the value of Y into the storage cell whose address is the value of X.

  P := !P Stores the value of the cell whose address is the value of P, as the new value of P.

The array indexing used the «V ! idx» syntax (section 2.13, «Vector application»).

So, the ! was a prefix operator for pointers, and it was an infix operator for array indexing.

In Richard's account of BCPL's evolution, he noted that on early hardware the exlamation mark was not easily available, and, therefore, he used a composite *( (i.e. a diagraph):

  «The star in *( was chosen because it was available … and it seemed appropriate for subscription since it was used as the indirection operator in the FAP assembly language on CTSS. Later, when the exclamation mark became available, *( was replaced by !( and exclamation mark became both a dyadic and monadic indirection operator».

So, in all likelihood, !X := Y became *(X := Y, eventually becoming *X = Y (in B and C) whilst retaining the exact and original semantics of the !.

[0] https://rabbit.eng.miami.edu/info/bcpl_reference_manual.pdf

The BCPL manual linked by you is not useful, as it describes a recent version of the language, which is irrelevant for the evolution of the B and C languages.

The use of the character "!" in BCPL is much later than the development of the B language from BCPL, in 1969.

The asterisk had 3 uses in BCPL, as the multiplication operator, as a marker for the opening bracket in array indexing, to compensate for the lack of different kinds of brackets for function evaluation and for array indexing, and as the escape character in character strings. For the last use the asterisk has been replaced by the backslash in C.

There was indeed a prefix indirection operator in BCPL, but it did not use any special character, because the available character set did not have any unused characters.

The BCPL parser was separate from the lexer, and it was possible for the end users to modify the lexer, in order to assign any locally available characters to the syntactic tokens.

So if a user had appropriate characters, they could have been assigned to indirection and address-of, but otherwise they were just written RV and LV, for right-hand-side value and left-hand-side value.

It is not known whether Ken Thompson had modified the BCPL lexer for his PDP computer, to use some special characters for operators like RV and LV.

In any case, he could not have used asterisk for indirection, because that would have conflicted with its other uses.

The use of asterisk for indirection in B became possible only after Ken Thompson has made many other changes and simplifications in comparison with BCPL, removing any parsing conflicts.

You are right that BCPL already had prefix operators for indirection and address-of, which was different from how this had been handled in CPL, but Martin Richards did not seem to have any reason for this choice and in BCPL this was a less obvious mistake, because it did not have structures.

On the other hand, Ken Thompson did want to have "*" as prefix, after introducing his increment and decrement operators, in order to need no parentheses for pre- and post-incrementation or decrementation of pointers, in the context where postfix operators were defined as having higher precedence than prefix.

Also in his case this was not yet an obvious mistake, because he had no structures and the programs written in B at that time did not use any complex data structures that would need correspondingly complex addressing expressions.

Only years later it became apparent that this was a bad choice, while the earlier choice of N. Wirth in Euler (January 1966; the first high-level language that handled pointers explicitly, with indirection and address-of operators) had been the right one.

Decades later, complex data structures became common while the manual optimization of manipulating explicitly pointers for addressing arrays became a way of writing inefficient programs, which prevent the compiler from optimizing correctly the array accessing for the target CPU.

So the choice of Ken Thompson can be justified in its context from 1969, but in hindsight it has definitely been a very bad choice.

C's «static» and «auto» also come from PL/I. Even though «auto» has never been used in C, it has found its place in C++.

C also had a reserved keyword, «entry», which had never been used before eventually being relinquished from its keyword status when the standardisation of C began.

C23 also has reused auto as C++, although type inference is more limited.

That is indeed correct. Kernighan in his original book on C cited Algol 68 as a major influence.

> I'm pretty excited for this

Aside from historical interest, why are you excited for it?

Personally, I think the whole C tangent was a misstep and would love to see Algo 68 turn into Algo 26 or 27. I sort of like C and C++ and many other languages which came, but they have issues. I think Algo 68 could develop into something better than C++, it has some of the pieces already in place.

Admittedly, every language I really enjoy and get along with is one of those languages that produced little compared to the likes of C (APL, Tcl/Tk, Forth), and as a hobbyist I have no real stake in the game.

Consider exploring Ada 2022 as a capable successor to Algol. Its well supported in GCC and scales well from very small to very large projects. Some information is at https://learn.adacore.com/ and https://alire.ada.dev/

Is like to order a complementary question to the sibling one. What are you going to add to (/remove from?) Algol 68 to get Algol 26?

I wonder about what you think is wrong with C? C is essentially a much simplified subset of ALGOL68. So what is missing in C?

Proper strings and arrays for starters, instead of being pointers that the programmer is responsible for doing length housekeeping.

Whilst I think that C has its place, my personal choice of Algol 26 or 27 would be CLU – a highly influential, yet little known and underrated Algol inspired language. CLU is also very approachable and pretty compact.

I've actually been toying with writing an Algol 68 compiler myself for a while.

While I doubt I'll do any major development in it, I'll definitely have a play with it, just to revisit old memories and remind myself of its many innovations.

If PL/I was like a C++ of the time, Algol-68 was probably comparable to a Scala of the time. A number of mind-boggling ideas (for the time), complexity, an array of kitchen sinks.

It certainly has quite a reputation, but I suspect it has more to do with dense formalism that was quite unlike everything else. The language itself is actually surprisingly nice for its time, very orthogonal and composable.

Finally.

I find this great, finally an easy way to play with ALGOL 68, beyond the few systems that made use of it, like the UK Navy project at the time.

Ironically, Algol 68 and Modula-2 are getting more contributions than Go, on GCC frontends, which seems stuck in version 1.18, in a situation similar to gcj.

Either way, today is for Algol's celebration.

Will it compile Knuth’s test? https://en.wikipedia.org/wiki/Man_or_boy_test

That test is short enough to just paste it in here:

    begin
      real procedure A(k, x1, x2, x3, x4, x5);
      value k; integer k;
      real x1, x2, x3, x4, x5;
      begin
        real procedure B;
        begin k := k - 1;
              B := A := A(k, B, x1, x2, x3, x4)
        end;
        if k ≤ 0 then A := x4 + x5 else B
      end;
      outreal(1, A(10, 1, -1, -1, 1, 0))
    end

The whole "return by assigning to the function name" is one of my least favorite features of Pascal, which I suppose got it from Algol 60. Where I'm confused though is, what is the initial value of B in the call to A(k, B, x1, x2, x3, x4)? I'm guessing the pass-by-name semantics are coming into play, but I still can't figure out how to untie this knot.

This is great news for GCC! I love how this decision supports older languages like Algol 68, keeping them alive in the FOSS world. It shows the hacker community's dedication to preserving diverse tools.

any algol tutorial recommendations? just to feel what's it all about

I would recommend the Informal Introduction to Algol 68, available in PDF at https://algol68-lang.org/resources

They can just fork off the Golang frontend and it would be the same, maybe patch the runtime a bit.

Being an old dog, as I mention elsewhere, I see a pattern with gcj.

GCC has some rules to add, and keep frontends on the main compiler, instead of additional branches, e.g. GNU Pascal never got added.

So if there is no value with maintenance effort, the GCC steering will eventually discuss this.

Does gcc even support go?

Until a few years ago, gccgo was well maintained and trailed the main Go compiler by 1 or 2 releases, depending on how the release schedules aligned. Having a second compiler was considered an important feature. Currently, the latest supported Go version is 1.18, but without Generics support. I don't know if it's a coincidence, but porting Generics to gccgo may have been a hurdle that broke the cadence.

The best thing about gccgo is that it is not burdened with the weirdness of golang's calling convention, so the FFI overhead is basically the same as calling an extern function from C/C++. Take a look at [0] and see how bad golang's cgo calling latency compare to C. gccgo is not listed there but from my own testing it's the same as C/C++.

[0]: https://github.com/dyu/ffi-overhead

Seems doubtful, given that generics and the gccgo compiler were both spearheaded by Ian Lance Taylor, it seems more likely to me that him leaving google would be a more likely suspect, but I don't track go.

This has been stagnant long before he left.

Yes, though language support runs behind the main Go compiler. https://go.dev/doc/install/gccgo

Wow that is cool. Pass by name. I always wanted to try it.

Algol60 had call by name, Algol68 doesn't really, it does have "proceduring" which creates a function to call when you pass an expression to a parameter that's a function pointer that has no parameters, you can use that to sort of do something like call by name but the expense is more obvious

Just pass a string and `eval` it.

Where might one look to find examples of such code? I've never found algol outside of wikipedia

You can find some modern Algol 68 code, using the modern stropping which is the default in GCC, at https://git.sr.ht/~jemarch/godcc

Godcc is a command-line interface for Compiler Explorer written in Algol 68.

https://rosettacode.org/wiki/Category:ALGOL_68

https://github.com/search?q=algol68&type=repositories

Without knowing what your interests/motivations and backgrounds are, it is hard to make good recommendations, but if you didn't know about rosettacode or github I figured I should start with that

Old papers and computer manuals from the 1960's.

Many have been digitalized throughout the years across Bitsavers, ACM/SIGPLAN, IEEE, or university departments.

Also heavily influenced languages like ESPOL, NEWP, PL/I and its variants.

[deleted]

[dead]

[flagged]