89

Losing 1½ Million Lines of Go

> Unfortunately, Go’s library doesn’t get updated every time Unicode does. As of now, January 2026, it’s still stuck at Unicode 15.0.0, which dates to September 2023; the latest version is 17.0.0, last September. Which means there are plenty of Unicode characters Go doesn’t know about, and I didn’t want Quamina to settle for that.

I have to say I am surprised about that. Does anyone have any context or guesses as to why this is the case?

EDIT: Go's unicode was actually updated to v17 yesterday:

https://github.com/golang/go/commit/dd39dfb534d2badf1bb2d72d...

5 hours agomroche

Based on the commit message and using "CL" which is google lingo for Change List on their internal system, I bet this was already available on the internal version and just ported to github version after someone pointed it out.

5 hours agomatt3210

Much more prosaic (if slightly embarrassing), I'm afraid: The update was non-trivial (this CL is simple, but there are some accompanying ones in x/text which are not) and it didn't hit the top of the priority list for anyone who understands x/text.

Go is pretty much entirely developed in public; there are some Google-internal customizations but none of them are particularly exciting and almost all changes start in the open source repo and are imported from there.

5 hours agoneild

"CL"/"Change List" is the lingo for the Gerrit code review tool, which is how all contributions to Go happen. Creating a GitHub PR simply triggers a bot to create a Gerrit CL, which is where all discussion about the "PR" happens and where the "accept" button gets clicked.

4 hours agoLukeShu
[deleted]
5 hours ago

Hard to get promoted at Google doing that

5 hours agowatchful_moose

[flagged]

3 hours agocap11235

> Sure, these automata are “wide”, with lots of branches, but they’re also shallow, since they run on UTF-8 encoded characters whose maximum length is four and average length is much less

I would consider splitting this task into two:

- extracting the next Unicode code unit

- determining whether it’s in the code class

For the second, instead of using an automaton, one could use a perfect hash (https://en.wikipedia.org/wiki/Perfect_hash_function). That could make that part branch-free.

Is that a good idea?