Choosing a Language Based on Its Syntax?

Semantics are where the rubber meets the road, certainly; but syntax determines how readable the code is for someone meeting it the first time.

Contrast an Algol-descendant like C, Pascal, Java, or even Python with a pure functional language like Haskell. In the former, control structure names are reserved words and control structures have a distinct syntax. In the latter, if you see `foo` in the body of a function definition you have no idea if it's a simple computation or some sophisticated and complex control structure just from what it looks like. The former provides more clues, which makes it easier to decipher at a glance. (Not knocking Haskell, here; it's an interesting language. But it's absolutely more challenging to read.)

To put it another way, syntax is the notation you use to think. Consider standard math notation. I could define my own idiosyncratic notation for standard algebra and calculus, and there might even be a worthwhile reason for me to do that. But newcomers are going to find it much harder to engage with my work.

I absolutely agree about Haskell (and also OCaml). They both suffer from "word soup" due to their designers incorrectly thinking that removing "unnecessary" punctuation is a good idea, and Haskell especially suffers from "ooo this function could be an operator too!".

I am an S-exp enjoyer, and more for practical reasons than aesthetic ones—I really like the editor tooling that's possible with S-expressions. So I will absolutely choose a Lisp or a lisp if given the option, even at some level of inconvenience when it comes to the maturity of the language itself. I will always write Hy[0] rather than Python, for example.

[0] https://hylang.org

(I am aware of Combobulate[1] for Emacs folks, of which I'm sadly not one.)

[1] https://GitHub.com/mickeynp/combobulate

in the era of LLMs, syntax might matter more than you think.

The c form of `type name;` is ambiguous because it could actually be more than one thing depending on context. Even worse if you include macro sheananigans. The alternate (~rust/zig) is `var/const/mut name type` is unambiguous.

For humans, with rather long memory of what is going on in the codebase, this is ~"not a problem" for experts. But for an LLM, its knowledge is limited to the content that currently exists in your context, and conventions baked in with the training corpus, this matters. Of course it is ALSO a problem for humans if they are first looking at a codebase, and if the types are unusual.

I hope that someday LLMs will interact with code mostly via language servers, rather than reading the code itself (which both frequently confuses the LLM, as you've noted, but is also simply a waste of tokens).

why? I suspect that writing code itself is extremely token efficient (unless like your keywords happen to be silly alien text)

This is an underappreciated point. I work across a lot of codebases and the difference in how well AI coding tools handle Rust vs JavaScript vs Python is striking — and syntax ambiguity is a big part of it.

The `type name` vs `let name: type` distinction matters more than it seems. When the grammar is unambiguous, the LLM can parse intent from a partial file without needing the full compilation context that a human expert carries in their head. Rust and Go are notably easier for LLMs to work with than C or C++ partly because the syntax encodes more structure.

The flip side: syntax that is too terse becomes opaque to LLMs for the same reason it becomes opaque to humans. Point-free Haskell, APL-family languages, heavy operator overloading — these rely on the reader holding a lot of context that does not exist in the immediate token window.

I wonder if we will see new languages designed with LLM-parseability as an explicit goal, the way some languages were designed for easy compilation.

Humans also have limited context. For LLMs it's mostly a question of pipeline engineering to pack the context and system prompt with the most relevant information, and allow tool use to properly understand the rest of the codebase. If done well I think they shouldn't have this particular issue. Current AI coding tools are mostly huge amounts of this pipeline innovation.

The syntax of a language is the poetry form, it defines things like meter, scansion, rhyming scheme. Of course people are going to have strong aesthetic opinions on it, just as there are centuries of arguments in poetry over what form is best. You can make great programs in any language, just like you make beautiful poetry in almost every form. (Leaving an almost there for people that dislike Limericks, I suppose.) Language choice is one of the (sometimes too few) creative choices we can make in any project.

> Another option is to do something like automatic semicolon insertion (ASI) based on a set of rules. Unfortunately, a lot of people’s first experience with this kind of approach is JavaScript and its really poor implementation of it, which means people usually just write semicolons regardless to remove the possible mistakes.

Though the joke is that the largest ASI-related mistakes in JavaScript aren't solved by adding more semicolons, it's the places that the language adds semicolons you didn't expect that trip you up the worst. The single biggest mistake is adding a newline after the `return` keyword and before the return value accidentally making a `return undefined` rather than the return value.

In general JS is actually a lot closer to the Lua example than a lot of people want to believe. There's really only one ASI-related rule that needs to be remembered when dropping semicolons in JS (and it is a lot like that Lua rule of thumb), the Winky Frown rule: if a line starts with a frown it must wink. ;( ;[ ;`

(It has a silly name because it keeps it easy to remember.)

> Lua is an example of such a language, and when a semicolon is necessary is when you have something that could be misconstrued as being a call:

    (function() print("Test1") end)(); -- That semicolon is required
    (function() print("Test2") end)()

Tangential, but I sidestepped this ambiguity in a language I've been designing on the side, via the simple rule that the function being called and the opening parenthesis can't have whitespace between them (e.g. "f()" is fine but "f ()" or "f\n()" is not). Ditto for indexing ("x[y]"). If these characters are encountered after whitespace, the parser considers it the beginning of a new expression.

By sacrificing this (mostly unused, in practice) syntactic flexibility, I ended up not needing any sort of "semicolon insertion" logic - we just parse expressions greedily until they're "done" (i.e. until the upcoming token is not an operator).

Syntax is what keeps me away from Rust. I have tried many times to get into it over the years but I just don't want to look at the syntax. Even after learning all about it, I just can't get over it. I'm glad other people do fine with it but it's just not for me.

For this reason (coming from C++) I wished Swift were more popular because that syntax is much more familiar/friendly to me, while also having better memory safety and quality of life improvements that I like.

Definitely second this sentiment. Rust just... Looks wrong. And for that reason alone I've never tried to get into it.

I understand exactly how shallow that makes me sound, and I'm not about to try and defend myself.

Wow, this is one of the most surprising comments I've ever read on HN!

Personally, I bucket C++ and Rust and Swift under "basically the same syntax." When I think about major syntax differences, I'm thinking about things like Python's significant indentation, Ruby's `do` and `end` instead of curly braces, Haskell's whitespace-based function calls, Lisp's paren placement, APL's symbols, etc.

Before today I would have assumed that anyone who was fine with C++ or Rust or Swift syntax would be fine with the other two, but TIL this point exists in the preference space!

Swift's syntax may look nice, but as soon as you run into "The compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions" you'll forget all of that. Hint: they are related.

Do you have some examples of what you couldn't get along with? I know this is a lot to ask, but to me while I do write Rust and I don't write C++ or Swift in volume (only small examples) the syntax just doesn't feel that different really.

If you do like Swift you might want to just bite the bullet and embrace the Apple ecosystem. That would be my recommendation I think.

This resonate so much to my relationship with Rust. Also with Go. I'm having hard time learning Rust's advacend concepts because of its syntax.

Strangely enough I find Lisp's parentheses much more attractive.

An article about diversity of language syntax that somehow only deals with C-adjacent curly-brace languages (and,tbf, Odin).

This is a blinkered viewpoint. If you want to talk about syntax, at least mention the Haskell family (Elm, Idris, F*, etc), Smalltalk, and the king of syntax (less) languages, LISP (and Scheme), which teach us that syntax is a data structure.

Language syntax is like the weather. When it's good (or when you're acclimated to it, I guess) you don't notice it. When the weather is perfect you don't even feel like the atmosphere even exists. When a language is so ingrained in your mental models, you don't even notice syntax, you just see semantics.

A languages syntax and its error messages are its user interface. Yes you can have a good tool that you don’t enjoy looking at. You can also have a good tool that’s frustrating to learn because its user interface isn’t clear and doesn’t do what you expect. Can I not hope for something that does what I need, is easy to use, and looks good?

I dislike the “you can change the syntax” argument because that just doesn’t happen. Closest thing is a new language that compiles to another.

As Ken Iverson noted in "Notation as a Tool of Thought"[1], yeah the syntax absolutely matters. The same program might resonate and make sense in one language but be incomprehensible if translated 1:1 in another.

Computer languages are for humans to understand and communicate.

1. https://www.eecg.utoronto.ca/~jzhu/csc326/readings/iverson.p...

I like the semantics you type in the google search bar when using it for impromptu calculations. You can use ^ to raise to a power, for example. Just type sin 45. It’s all least surprise.

I love how it works with units too, eg. c/433MHz or 4 bytes * 20hz * 24 hours

Syntactically it probably has a ceiling, to be so casual. Least surprise won’t work for very complex programs. But maybe the programs wouldn’t be so complex if you didn’t have to stick together complex program syntax either.

Syntax, or how humans perceive the syntax, is only a very small part of the problems when designing a programming language. There is a lot more about how a compiler would handle the syntax (efficiently) and about how the syntax affects actual code and ecosystem.

The recent go blog on error handling should make it clear that syntax is often not worth worrying about. https://go.dev/blog/error-syntax

Major syntactic structures definitely have an influence on my language choices. Outside of compilation and runtime model, modeling the domain (both data and procedures) changes drastically between paradigms. Syntax is what enables or hamstrings different modeling paradigms.

My two biggest considerations when picking a language are:

- How well does it support value semantics? (Can I pass data around as data and know that it is owned by the declaring scope, or am I chained to references, potential nulls, and mutable handles with lifetimes I must consider? Can I write expression-oriented code?)

- How well does it support linear pipelining for immutable values? (If I want to take advantage of value semantics, there needs to be a way to express a series of computations on a piece of data in-order, with no strange exceptions because one procedure or another is a compiler-magic symbol that can't be mapped, reduced, filtered, etc. In other words, piping operators or Universal Function Call Syntax.)

I lean on value semantics, expression-oriented code, and pipelining to express lots of complex computations in a readable and maintainable manner, and if a language shoots me in the foot there, it's demoralizing.

Python doesn't have automatic semicolon insertion.

A lot of programming is taste, and syntax gives you a very quick judgement about how good the language designer's taste is. How familiar they are with what we know about which syntax works well, and so on. For example if you're designing a language in 2026 that uses `type name` instead of `name: type`... that is highly suspicious.

Also syntax is the interface through which you interact with the language, so bad syntax is going to be something annoying you have to deal with constantly. Sure you'll be able to write good programs in a language with bad syntax choices, but it's going to be less fun.

> Odin’s rules, which are very similar to Python’s, are to ignore newline-based “semicolons” within brackets (( ) and [ ], and { } used as an expression or record block).

Honestly I always thought that was a bit crap in Python and I'm surprised anyone thought this was a sensible thing to copy. Really, just use semicolons. As soon as an "easy" rule becomes even vaguely difficult to remember it's better to bin it and just require explicitness, because overall that is easier.

I have really just one wish when it comes to syntax: no syntactically significant whitespace. Space, newline, tab, etc. should ALL map to the same exact token. In practice this also means semicolons or something like them are needed as well, to separate expressions/statements. I dislike langs that try to insert semicolons for you, but at least it's better than the alternative.

the way python treats whitespace is a huge design mistake that has probably wasted like a century (if not more) worth of time across all users, on something really trivial.

That's one of the things I like about C, the independence in how one can write code. I was able to develop my own style thanks to that, visualising the structure of the code to distinguish the different parts of statements and make it more clear (at least to myself).

(edited several times to try to correct changes in formatting for an example here, but it's just screwed up :-/ )