Lix – universal version control system for binary files

Lix is also a soft fork of the official Nix package manager implementation: https://lix.systems/

Lix is also the name of a computer science and mathematics laboratory: https://www.lix.polytechnique.fr/

I really assumed that this was that; even calling it a universal version control system for binary files would be kind of a weird way of describing it but is plausibly a valid description for the package manager.

Git can display diff between binary files using custom diff drivers:

> Put the following line in your .gitattributes file: *.docx diff=word

> This tells Git that any file that matches this pattern (.docx) should use the “word” filter when you try to view a diff that contains changes. What is the “word” filter? You have to set it up [in .gitconfig].

https://git-scm.com/book/en/v2/Customizing-Git-Git-Attribute...

In their 'Git is unsuited for applications' blog post[0] they also say the following:

> We currently have to clone the whole repository just to edit translation files. That is problematic for big repositories. The repository for posthog.com for example is ~680MB in size. Even though we only need translation files which would be at max 1MB in size, we have to clone the whole repository. That is also one of the reasons why git is not used at Facebook, Google & Co which have repository sizes in the gigabytes.

I get that it can be a bit complex, but Git can handle this circumstance pretty easily if you know how (or write a script for it).

For example, cloning the GIMP repo from GitLab takes me about 56 seconds and uses up 632 MB on disk, using just `git clone <repo>`.

In comparison, running these commands:

    git clone --quiet --filter=blob:none --sparse https://gitlab.gnome.org/GNOME/gimp.git gimp-sparse-clone
    git -C gimp-sparse-clone sparse-checkout add po po-libgimp po-plug-ins po-python po-script-fu po-tags po-tips po-windows-installer

(You can also run `git sparse-checkout init --no-cone` and then just `git sparse-checkout add *.po` to grab every .po file in the repo and nothing else)

Takes 14 seconds on my laptop and uses 59 MB of disk space, and checks out only the specified directories and their contents.

So yeah, it's not as automatic as one might like but ship a shell script to your translators and you're good to go. The 'Git can't do X' arguments are mostly untrue; it should really be 'Getting git to do X is more complicated than I would prefer' or 'Explaining how to do X is git is a pain', both of which are legitimate complaints.

[0] https://samuelstroschein.com/blog/git-limitations/

This is great for showing diffs. To actually make git store only deltas, not entire binaries, you would need to configure "clean" and "smudge" filters for the format. Given that docx (and xlsx) are a bunch of XML files compressed by zip, you can actually have clean diffs, and small commits.

This is really great. I read the Git config article, but I thought the image diff example was kinda lackluster. Im sure some better metrics could be extracted for a more descriptive diff.

Thanks for sharing!

Yeah, this is how I would prefer to solve this problem personally, but it would be really nice to have some collection of tools that cover common binary file formats automatically instead of having to configure this manually every time.

Would be interesting to see some tooling built around being a custom diff driver for a bunch of different standard formats!

I found this in my git starts: https://github.com/xltrail/git-xl?tab=readme-ov-file

And then there is also Pandoc that I guess could be helpful in this regard.

Holy moly. I just went to bed. Checking my phone for last time. Opening hackernews for "one last scroll" and see lix, my project, popping up here.

Going through the questions now. So much for going to bed.

Learnings from the comments so far: I need to refine the positioning of lix.

Lix is not a replacement for git. Nor does it target version controlling code as the primary use case.

A better positioning might be "version control system as a library". The primary use case is embedding lix into applications, AI agents, etc. that need version control.

I need to to bed now. I have a flight to catch in 6 hours.

PS I am open to suggestions regarding the positioning!

Home page states Lix can diff. "any file format like .xlsx, .pdf, .docx"

Wow, sounds useful. Git doesn't do that out of the box.

BUT... the list of available "plugins" only has .csv,.md and json, which are things that git already handles just fine?

Can it actually diff excel and word and PDF or not?

It can but the plugins are not developed for production readiness yet. I should clarify that.

The way to write a plugin:

Take an off the shelf parser for pdf, docx, etc. and write a lix plugin. The moment a plugin parses a binary file into structured data, lix can handle the version control stuff.

Same name as my Phoenix inspired framework for go: https://codeberg.org/lixgo/lix

I wonder how much room this leaves for unintended, not shown changes. E.g. Excel is a complex format that allows all sort of metadata and embeddings that would not always seem as cell changes ...

Depends on the diff you render and what the plugin tracks.

In general, lix gives in API to track changes in any file format (via plugins). The "diff noise" thus depends on a) the plugin i.e. does it track them metadata? and b) what is rendered as the diff.

If the user doesn't care about seeing a diff of metadata in Excel, don't render the metadata in the diff. The latter is trivial because diffing in lix is just a SQL query.

name confusing it be

https://lix.systems/

Name collision. I thought it might be the "Lix" fork of "Nix".

It seems to me that this is just an issue of diff features. Git can extended to show semantic diff of binary files and it doesn't technically need a completely new VCS.

As git became the most popular VCS right now and it continues to do so for foreseeable future, I don't think incompatibility with git is a good design choice.

Indeed, if lix were to target code version controlling, incompatibility with git is a “dead on arrival” situation.

But, Lix use case is not version controlling code.

It’s embedding version control in applications. Hence, the reason why lix runs within SQL databases. Apps have databases. Lix runs of top of them.

The benefit for the developer is a version control system within their database, and exposing version control to users.

I look at the page and leave without any clue as to what it actually does. Agents and AI are mentioned so I assume it might just be incoherent slop?

The person behind this boasts on Twitter, that they fired all their remote developers and used AI instead.

Judging by tweets, this project is 2-3 years in the making.

> Lix is a universal version control system that can diff any file format (.xlsx, .pdf, .docx, etc).

> Unlike Git's line-based diffs, Lix understands file structure. Lix sees price: 10 → 12 or cell B4: pending → shipped, not "line 4 changed" or "binary files differ".

How? I have a custom binary file format, how would Lix be able to interpret this?

> Lix adds a version control system on top of SQL databases that let's you query virtual tables like file, file_history, etc. via plain SQL. These table's are version controlled.

What does SQL have to do with everything?

Thanks for the feedback.

AI agents are the pull right now to why version control is needed outside of software engineering.

The mistake in the blog post is triggering comparisons to git, which leads to “why is this better/different than git?”.

If you have a custom binary file, you can write a plugin for it! :)

Lix runs on top of a SQL database because we initially built lix on top of git but needed:

- database semantics (transactions, acid, etc.)

- SQL to express history queries (diffing arbitrary file formats cant be solved with a simple diff() API)

Great semantic diffs, but does Lix actually define a merge algebra for concurrent structured edits, or are conflicts just punted back to humans? How does its SQL engine guarantee deterministic merges vs last-write-wins?

Merge algebra is similar to git with a three way merge. Given that lix tracks individual changes, the three way merge is more fine grained.

In case of a conflict, you can either decide to do last write wins or surface the conflict to the user e.g. "Do you want to keep version A or version B?"

The SQL engine is merge unrelated. Lix uses SQL as storage and query engine, but not for merges.

Hi, before you get too wedded to the name, you should be aware that there's already a major nix project called lix: https://lix.systems/.

Before clicking, I assumed this was actually a new feature of theirs that would apply nix build principles of some sort to version control of binaries.

They should change the name while they still can https://lix.systems/

Great name! :)

Looks cool, but seems kind of weird that it only works through an sdk. Should there be a cli or something?

Edit: Oh I see. Seems like their use case is embedding version control into another application.

Correct. Lix has been developed with the embedded use-case in mind.

Someone can write a CLI for it. Though, the primary use case is not code version control but embedding into applications

I wonder if this could be used in conjunction with git for UT5 projects

It's nice, but it needs to support the most common file formats used in gamedev to gain enough traction.

Git is a command line program so it feels strange that this doesn't seem to support that use case.

Hi,

I'm the creator of lix.

Lix doesn't target code version control. It can be used for it. But the primary use case is embedding version control in applications. Such an application can be an AI agent that modifies files which entails the need to show what the agent did in that file e.g. tracking the changes.

Git is good enough for code. I don't think there is space to gain much market share.

Some feedback about the primary use case.

Your Lix doc (LLM written but with typos?) is sort of weird, handwaving how Lix does version control over, say, Excel, to say it's about working with SQL databases:

How does Lix work?

Lix adds a version control system on top of SQL databases that let's you query virtual tables like file, file_history, etc. via plain SQL. These table's are version controlled.

Then it gets weirder:

Why this matters:

Lix doesn't reinvent databases — durability, ACID, and corruption recovery are handled by battle-tested SQL databases.

This seems like a left turn from the value prop and why the value prop matters?

A firm-wide audit trail of changes to typically opaque file types (M365 files in particular) could be tremendously valuable -- and additive -- compared to the versioning that's baked into the file bundles. The version control is already embedded by the app, what adds value is reporting on or managing that from outside the app.

As for how it works, both in the docs and in the comment I'm replying to, it's unclear how any of this interacts with the native version control embedded in M365 apps or why this tool can be trusted as effective at tracking M365 content changes.

Does the following make more sense to you in respect to SQL?

Lix uses SQL databases as storage and query engine. Aka you get a filesystem on top of your SQL database that is version controlled.

Or, the analogy to git: Git uses the computers filesystem as storage layer. Lix uses SQL database (with the nice benefit of making everything queryable via SQL).

> Lix doesn't reinvent databases — durability, ACID, and corruption recovery are handled by battle-tested SQL databases.

>> This seems like a left turn from the value prop and why the value prop matters?

Better wording might be "Lix uses a SQL database as storage layer"?

The SQL part is crucial for two reasons. First, the hard part like data loss guarantees, transactions, etc. are taking care of by the database. We don't have to build custom stuff. Which secondly, reduces the risk for adapters that data loss can occur with lix.

> As for how it works, both in the docs and in the comment I'm replying to, it's unclear how any of this interacts with the native version control embedded in M365 apps or why this tool can be trusted as effective at tracking M365 content changes.

It doesn't interact with version control in M365.

I'll update the positioning. Lix is a library to embed version control in whatever developers are building. Right now, lis is mostly interesting for startups that build AI-first solutions. They run into the problem "how do customers verify the changes AI agents make?".

The angle of universal version control, and using docx or Excel as an example, triggers the wrong comparisons. By no means is Lix competing with Sharepoint or existing version control solutions for MS Office.

Based on the product description, it seems that they don't like text, and want to deal in objects. It would feel strange if they did support a terminal, rather than a GUI.

because its a stupid content tracker. see man git.

for office files one can also unzip and zip to store them in git as plaintext

Its a pity Word doesnt open it's own OOXml export. At least Libre office has .fodt.

> Its a pity Word doesnt open it's own OOXml export

They can’t. It’s the only thing keeping them relevant.

It was initially hard for me to understand how this could work but it looks like there is a plugin system?

Yes. The tracking works via plugins to keep it generic. Here is a rough illustration:

File change -> Plugin (detects changes) -> Lix

It works surprisingly well because most standard file formats have off the shelf parsers. Parse a file format, and et voila, it is trivial to diff. Then pass on a standard schema for changes to lix and you end up with a generic API to query changes.

Weird sales pitch. I think Git is super mediocre and a VCS that supports binary files would be awesome.

But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.

> But then the first thing it talks about is diffing files. Which honestly shouldn’t even be a feature of VCS. That’s just a separate layer.

There is nuance between git line by line diffing and what lix does.

For text diffing it holds true that diffing is a separate layer. Text files are small in size which allows on the fly diffing (that's what git does) by comparing two docs.

On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.

What lix does under the hood is tracking individual changes, _which allows rendering a diff without on the fly diffing_. So lix is kind of responsible for the diffs but only in the sense that it provides a SQL API to query changes between two states. How the diff is rendered is up to the application.

> On the fly diffing doesn't work for structured file formats like xlsx, fig, dwg etc. It's too expensive. Both in terms of materializing two files at specific commits, and then diffing these two files.

I don’t think that’s actually true?

How often are binary files being diffed? How long does it take to materialize? How long to run a diff algorithm?

I’ve worked with some tools that can diff images. Works great. Not a problem in need of solving.

In any case I’ll give benefit of the doubt that this project solves some real problem in a useful way. I’m not sure what it is.

My goals in a VCS for binary files seem to be very very very different than yours.

Through the gitattributes and gitconfig files, git can be extended to work with any external tool for specific file types. For example: https://github.com/ewanmellor/git-diff-image

Most version control systems that are not Git support binary. In the industry you most often see Perforce P4 and Subversion being used for that purpose.

Correct. Perforce is expensive AF and is also kinda meh. They got bought by private equity and haven’t meaningfully improved it for like 15 years. But they’ve got gamedevs by the balls who don’t have an alternative. It’s unfortunate.

compelling problem statement. md and csv have their limit.

[dead]