Vibe-Coded Ext4 for OpenBSD

> So as of today, the Copyright system does not have a way for the output of a non-human produced set of files to contain the grant of permissions which the OpenBSD project needs to perform combination and redistribution.

This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright! You can distribute it at will, not due to any sort of legal grant but simply because you have the ability and the law says nothing to stop you.

This all relies, as the article points out, on everyone looking directly at code that both looks like and works like the only extant codebase for EXT4 and nonetheless concluding that in fact the computer conjured it from the aether. If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.

> If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.

That's not the case here. A re-implemented piece of software that does not contain meaningful verbatim excerpts from the original is not subject to the copyright of the original.

that is not certain. if you read code and then reimplement it using the original code as reference, the claim has been made that this falls under the copyright of the original because the new code is derived from the old code. unfortunately this particular situation has not yet been tested in court. but clean room implementations are done specifically to avoid the risk reading the original code poses. if this was clear cut then clean room development would not be needed.

this is similar to creating an extension to some program, because the extension could not be written without the original even if the interface the extension is using is a public API. the claim has been made that the copyright of the original program applies. i think the linux kernel is an example here.

see also these questions on stackexchange:

https://softwareengineering.stackexchange.com/questions/2087...

https://softwareengineering.stackexchange.com/questions/8675...

> this is similar to creating an extension to some program

There's no such thing as "an extension to some program". A derivative work is a work that contains the original. Using the privileges provided by copyright law, the creator may impose licensing restrictions on how the original work is used - but that's contract law, not copyright.

For example the GPL and the AGPL define different sets of use restrictions, none of that matters in this case because the original work is not being reproduced or used per se.

As I already said in my other, down-voted comment - copyright is only about verbatim, or near verbatim copies, in whole or in part - it's the spirit that both judgment and the letter of the law are supposed to follow. Copying of functionality is not subject to copyright.

For example, one can use the same topic for a work of poetry for a similar aesthetic effect and that doesn't infringe other poems.

The GPL used a hack to stretch copyright law into a near opposite but stretching it further goes into absurd territory, achieving the opposite of what the GPL claims to protect.

a kernel driver is an extension to the kernel. yet, even with a clearly defined API it is a derived work of the kernel.

one can use the same topic for a work of poetry for a similar aesthetic effect and that doesn't infringe other poems

because the new poem does not depend on the original.

the kernel driver is useless without the kernel

What if one reverse engineered the original logic, for example translating the assembly code into a higher level language. They didn't use or look at the original code. Does that still count as "clean room"? What's the legal difference between that and deriving the logic just from observing how the running program acts?

there is no legal precedence that clarifies what clean room development is. clean room development is a precaution to stay away as far as possible from the original code in order to reduce the risk of infringement. clearly, not looking at the assembly code is better than looking at it.

Under the premise advanced in the quote, copyright is not being violated because there is none. Thus, the quote makes no sense as stated. It may be that, additionally, copyright is in fact being violated (I don't believe it myself), but if so that's a separate argument.

The premise of the quote does not contain the assumption that there is no copyright to the code. In fact the various contributors do not advance an opinion about whether code written by an AI can be granted copyright. Rather they are saying that it is obviously derivative of code that is under copyright, that is only distributed under terms which, however many dry cleaners process it, will still conflict with the license under which they publish their software.

> Rather they are saying that it is obviously derivative of code that is under copyright

Derivatives are not subject to copyright, unless they are close to, and contain substantial verbatim copies from, the original. It's a virtual certainty that a vibe-coded Ext4 FS is none of the above.

Redefining copyright as some weird patenting of similar ideas is absurd.

see my response here: https://news.ycombinator.com/item?id=47557250

Eh … the argument will likely be things created by Thing at the behest of Author is owned by the Author. It’ll take a few cases going through the courts, or an Act of Congress to solidify this stuff.

Just like we settled on photographers havin copyright on the works created by their camera. The same arguments seem to apply

The US Copyright Office has published a piece that argues otherwise, but a) unless they pass regulation their opinion doesn't really matter, and b) there is way too much money resting on the assumption code can be copyrighted despite AI involvement.

It's not settled. The monkey selfie copyright dispute ruled that a monkey that pressed the button to take a selfie, does not and cannot open the copyright to that photo, and neither does the photographer who's camera it was. How that extends to AI generated code is for the courts to decide, but there are some parallels to that case.

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

But with the monkey there are two levels of separation from the artist: the human makes the creative decision to hand the camera to a monkey, who presses the trigger, and the camera makes the picture. Compared to the single layer of separation of a photographer choosing framing and camera parameters, pressing the trigger and the camera taking the picture. Or the zero levels of separation when the artist paints the picture.

A programmer writing code would be like the painter, and the programmer writing a prompt for Claude looks a lot like the photographer. The prompt is the creative work that makes it copyrightable, just like the artistic choices of the photographer make the photo copyrightable

You could argue that the prompt is more like a technical description than a creative work. But then the same should probably be true of the code itself, and consequently copyright should not apply to code at all

The copyright office's argument is that the AI is more like a freelancer than like a machine like a camera. Which you might equate to the monkey, who's also a bit freelancer like. But I have my doubts that holds up in court. Monkeys are a lot more sentient than AIs

The copyright office is pretty clear on this if you read: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell....

There is case law surrounding the fact that just because you commission a work to another entity doesn't give you co-authorship, the entity doing the work and making creative decisions is the entity that gets copyright.

In order for you to have co-authorship of the commissioned work you have to be involved and pretty much giving instruction level detail to the real author. The opinion shows many cases that its not the case with how LLM prompts work.

The monkey selfie case is relevant also because since it also solidifies that non-persons cannot claim copyright, that means the LLM cannot claim copyright, and therefore it does not have copyright that can be passed onto the LLM operator.

The law is whatever it needs to be to satisfy monied interests with the degree of acceptable of adaptation being a function of the unity of those interests and the political ascendancy of those in favor.

Overwhelmingly this is in favor of treating ai as a tool like Photoshop.

Even those against AI disagree on different matters and will overwhelmingly want a cut not a different interpretation.

This filesystem driver was made by a human using AI, not a monkey.

Haven't there already been a few cases, each of which found that mechanically-produced works are not copywritable?

yes https://copyrightalliance.org/current-ai-copyright-cases-par...

Just because you can distribute something doesn't mean you aren't violating someone else's copyright. You cannot assume that just because a language model popped out some code for you that it is clear of any other claims.

This is just lazy copyright whitewashing.

> This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright!

This opinion is simplistic. LLMs are trained with pre-existing content, and their output directly reflects their training corpus. This means LLMs can generate output that matches verbatim existing work. And that work can very well be subjected to copyright.

Language models are good at translation and retrieval. This also extends to computer languages. LLMs translate from GPL to other licenses the same way Google translate turns French to English, except that the source material is implicitly stored in the LLM.

this is disputed. see my comment here, especially the stackexchange links: https://news.ycombinator.com/edit?id=47557250

The article is largely about the copyright concerns of LLM generated code that was almost certainly trained on the GPL original.

Also, it is essentially an ext2 filesystem as it does not support journaling.

Binaries are copyrightable in both the US and the EU, and they are not technically produced by a human either, they're produced by a computer program. I honestly don't understand why this isn't extended to AI-generated code. Isn't it the same thing? One could argue that compilers merely transform source code into binaries "as is," while AI models have some "knowledge" baked in that they extract and paste as code. But there are compilers that also generate binaries by selecting ready-to-use binary patches authored by compiler developers and combining them into a program. One could also argue that, in the case of compilers, at least the input source code is authored by a human. But why can't we treat prompts as "source code in natural language" too? Where is the line between authorship and non-authorship, and how is the line defined? "Your prompt was too basic to constitute authorship" doesn't sound like an objectibe criterion.

Maybe for lawyers, AI is some kind of magical thing on its own. But having successfully created a working inference engine for Qwen3, and seeing how the core loop is just ~50 lines of very simple matrix multiplication code, I can't see LLMs as anything more than pretty simple interpreters that process "neural network bytecode," which can output code from pre-existing templates just like some compilers. And I'm not sure how this is different from transpilers or autogenerated code (like server generators based on an OpenAPI schema)

Sure, if an LLM was trained on GPL code, it's possible it may output GPL-licensed code verbatim, but that's a different matter from the question of whether AI-generated code is copyrightable in principle.

Interestingly, I found an opinion here [0] that binaries technically shouldn't be copyrightable, and currently they are because:

  the copyright office listened to software publishers, and they wanted binaries protected by copyright so they could sell them that way

[0] https://freesoftwaremagazine.com/articles/what_if_copyright_...

That linked opinion overstates the case. In the real-world, two different programs performing any non-trivial but functionally identical task will look substantially dissimilar in their source code, and that dissimilarity will carry over to the compiled binary, meaning what was expressive (if anything) is largely preserved. To the extent two different programs do end up with identical code, then that aspect was likely primarily functional and non-copyrightable, or at least the expressive character didn't carry over to the binary. Ordering and naming of APIs in source code can be expressive, and that indeed is often lost (literally or at least the expressive character) during the compilation process, but there are other expressive aspects to software programing that will be preserved and protected in the binary form.

IMO, your intuition regarding AI is right--it's not a magic copyright laundering machine, and AFAIU courts have very quickly agreed that infringement is occurring. But in copyright law establishing infringement (or the possibility of infringement) is the easy, straight-forward part. Copyright infringement liability is a much more complex question. Transformative uses in particular are a Fair Use, and Fair Use is technically treated as an affirmative defense to infringement.[1] If something is Fair Use, infringement is effectively presumed. But Fair Uses are typically very fact-intensive questions, and unlike the case with search engines I'm not sure we'll get to the point where there's a well-defined fence protecting "AI".

[1] There's a scholarly pedantic debate about whether Fair Use is properly a "defense", rather than "exception" to infringement, but it walks and talks like a defense in the sense that the defendant has the burden of proving Fair Use after the plaintiff has established infringement. There's a similarly pedantic (though slightly more substantive) debate in criminal law regarding affirmative defenses. But the very term "affirmative defense" was coined to recognize and avoid these pedantic debates.

Wow that thread just kept going. Whilst the LWN article covered most of the "highlights" I think this reply from Theo is pretty suscient on the topic at large [1].

[1] https://marc.info/?l=openbsd-tech&m=177425035627562&w=2

> Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.

Thats awesome lmao

that's not a statement from a lawyer, and it's confused. there is one true thing in there which is that at least under US considerations the LLM output may not be copyrightable due to insufficient human involvement, but the rest of the implications are poorly extrapolated.

there are lots of portions of code today, prior to AI authorship, that are already not copyrightable due to the way they are produced. the existence of such code does not decimate the copyright of an overall collective work.

Can someone explain this to me? I was under the impression that if a work of authorship was not copyrightable because it was AI generated and not authored by a human, it was in the public domain and therefore you could do whatever you wanted with it. Normal copyright restrictions would not apply here.

Data theft of service or piracy from the web and "AI" users content are used in the model training sets, and when codified the statistical saliency is significant if popular content is present.

For example, when an LLM does a vector search, there is a high probability of pirated content bleed-though and isomorphic plagiarism in the high dimensional vector space results. Thus, often when you coincidentally type in "name a cartoon mouse", there is a higher probability Disney "Micky Mouse" will pop out in the output rather than "Mighty Mouse". Note Trademarks never expire if the fees are paid, and Disney can still technically sue anyone that messes with their mouse.

Much like em dashes "--", telling the current set of models to stop using them inappropriately often fails. Also, activation capping is used to improve the models behavioral vector, and have nothing to do with the Anthropic CEO developing political ethics.

LLM are useful for context search, but can't function properly without constantly stealing from actual humans. Thus, will often violate copyright, trademark, and patents. In a commercial context it is legally irrelevant how the output has misappropriated IP, and one can bet your wallet the lawyers won't care either. No, IP is not public domain for a long time (17 to 78 years) regardless of peoples delusions, even if some kid in a place like India (no software patents) thinks it is..

This channel offers several simplified explanations of the work being done with models, and Anthropic posts detailed research papers on its website.

https://www.youtube.com/watch?v=YDdKiQNw80c

https://www.youtube.com/watch?v=Xx4Tpsk_fnM

https://www.youtube.com/watch?v=JAcwtV_bFp4

Many YC bots are poisoning discourse -- so this thread will likely get negative karma. Some LLM users seem to develop emotional or delusional relationships with the algorithms. The internet is already >52% generated nonsense and growing. =3

This does not answer my question.

The quoted content said that "Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now." I was explicitly asking how this meshed with my understanding of copyright, at least in the United States, which requires that a work of authorship be authored by a human and not by a machine; where a work is not authored by a human, copyright protection does not subsist, and therefore the respective work is in the public domain. And I was further asking for an explanation as to how including a work that is AI-generated (aka in the public domain) made "... that aggregate body becoming less free". Unless my understanding of copyright law and court precedent is massively off the mark, I am confused as to how less freedom is aforded in this instance.

The precedent case in the US formed a legal consensus that "AI" content can't be copyrighted, but it may also contain unlicensed/pirated IP/content.

Thus, one should not contaminate GPL/LGPL licensed source code with such content. The reason it causes problems is the legal submarines may (or may not if they settled out of court with Disney) surface at a later date, as the lawsuits and DMCA strikes hit publishers.

It doesn't mean people won't test this US legal precedent, as most won't necessarily personally suffer if a foundation gets sued out of existence for their best intentions/slop-push. =3

> Who is the copyright holder in this case? It clearly draws heavily from an existing work, and it's clear the human offering the patch didn't do it. It's not the AI, because only persons can own copyright. Is it the set of people whose work was represented in the training corpus? Was the it the set of people who wrote ext4 and whose work was in the training corpus? The company who own the AI who wrote the code? Someone else?

I don't love this take. Specifically:

> it's clear the human offering the patch didn't do it

I find it hard to believe that there wasn't a good bit of "blood, sweat, and tears" invested by a human directing the LLM to make this happen. Yes, LLMs can spit out full projects in 1 prompt but that's not what happened here. From his blog the work on this spanned 5 months at least. And while he probably wasn't working on it exclusively during that time, I find it hard to believe it was him sending "continue" periodically to an LLM.

Anyone who has built something large or complicated with LLM assistance knows that it takes more than just asking the LLM to accomplish your end goal, saying "it's clear the human offering the patch didn't do it" is insulting.

I've done a number of things with the help of LLMs, in all but the most contrived of cases it required knowledge, input from me, and careful guidance to accomplish. Multiple plans, multiple rollbacks, the knowledge of when we needed to step back and when to push forward. The LLM didn't bring that to the table. It brought the ability to crank out code to test a theory, to implement a plan only after we had gone 10+ rounds, or to function as grep++ or google++.

LLMs are tools, they aren't a magic "Make me ext4 for OpenBSD"-button (or at least they sure as hell aren't that today, or 5 months ago when this was started).

Regardless of license status, I'd be very hesitant to trust a vibe-coded filesystem implementation with my data.

Why did they even mention it was vibe-coded? Would it not be a lot harder for someone to prove that fact if you just didn't tell them?

Vibe coding and OpenBSD. The perfect combination.

Vibe coding and file systems are even better

trying to load with linux ext4 hmm doesn't load, but it works with my version!

Must be a bug in the linux kernel, let me git clone and build an out-of-tree module...

Kent Overstreet has already blazed that trail.

It's clearly an experiment.

[deleted]

I vibe-configured an Edgerouter 4 as a hot-drop box that would establish a secure tunnel and create a fake WAN for some servers that had to be temporarily pulled from service but remain operational in someones home garage. I overnight shipped it to them with two of the ports labeled, they plugged in home internet on one port, the rack on the other port, and it secure tunneled to a Linode VPS to get a public IP, circumventing all the Verizon home internet crap. I used OpenBSD. Claude did most of the work.

[deleted]

Can someone just copyright wash Windows already.

The Windows 2000 and Windows XP sources are readily available and must have made it into the training data. But most software has dropped XP support. You really need at least some of the Win 8 and Win 10 APIs to claim compatibility with modern software, and I doubt claude has seen those from the inside

ReactOS did this without any need for an LLM.

No they didn't. It would be copyright washing if someone contributed to ReactOS who remembered large portions of the Windows code and wrote the ReactOS implementations based on that.

I'd like to see it AFL fuzzed and compared to the original. Took 2 hours to first bug ten years ago in 2016.

Discussion then https://news.ycombinator.com/item?id=11469535

Mirror of the slides https://events.static.linuxfound.org/sites/events/files/slid...

As someone handling dozens of OpenBSD servers and VMs at work, I dont care about copyright and licenses anymore.

Its 2026, just shut up and give us at least one modern filesystem already!

I liked this reply in the thread :

There's another issue surrounding developer skill atrophy or stunting that I find \ particularly concerning on an existential level.

If we allow people to use LLMs to write code for a given project/platform, experience \ in that platform will potentially atrophy or under develop as contributors \ increasingly rely on out sourcing their applicable skills and decisions to "AI".

Even if you believe out sourcing the minutia of coding is a net positive, the \ "enshitification" principal in general should give you pause; as soon as the net \ developer skill for a project has degraded to a point of reliance, even somewhat, I \ think we can be confident those AI tools will NOT get less expensive.

I'd rather be independently less productive, than dependent on some MegaCorp(TM)'s \ good will to rent us back access to our brains at a fair price.

- achaean

https://marc.info/?l=openbsd-tech&m=177430829313972&w=2

~20 years ago, the Linux camp accused OpenBSD of importing GPL'd code (a wireless driver IIRC) and cried foul. The code was removed.

Fast forward to 2026, Theo says no to vibe-coded slop, prove to me your magic oracle LLM didn't ingest gobs of GPL code before spitting out an answer.

People are big mad of course, but you want me to believe Theo is the bad guy here for playing it conservatively?

The history is a bit backwards but the point is good. OpenBSD atheros wireless code was imported into linux, the BSD attributions were removed, and it was re-declared as GPL. That was later changed back.

https://marc.info/?l=linux-wireless&m=117579116031296&w=2

It is amusing to see that the only concern seems to be about a confusion around licensing, not the validity or maintainability of the code itself.

Eh, well, if your guns are trained on the "copyright" portion of the ship and you can sink it from there, no need to waste ammo or time trying to figure out if code bits are as explosive as the copyright bits are. Probably the code is just as sinkable, e.g. here's a recent response to some other AI slop:

  I didn't look closely at most of the code but one thing that caught my eye, pid is not safe for tempfile name generation, another user of the system can easily generate files that conflict with this. Functions like mktemp and mkstemp are there for a reason. Some of the other "safety" checks make no sense. If the LLM code generator is coming up with things which any competent unix sysadmin (let alone programmer) can tell are obviously wrong, it doesn't bode well for the rest.

https://marc.info/?l=openbsd-ports&m=177460682403496&w=2

The next AI winter can't come soon enough…

How is that different than a human writing the code? Whether an AI or a human wrote it, I would expect the same bar of validity/maintainability.

To me, SOTA is just bad at DRY, KISS, succint, well architected, top down, easy to test code and has to be constantly steered to come close. Even the article suggests that. YMMV.

TDD and strong goals help..

..much like with human development.

TDD makes the code test-passable, but it is still rng. As for goals, you can't foresee every stupid thing it will generate. It will look at a state machine, and rather than using the existing event structure, write its own loops and conditions. This is very different compared to human devs. No goal will help. You just keep yanking its chain until it generates as described. It can't even put imports at the top as you described. It can't help making circular refs in c++ despite being specifically told to use a hierarchical structure. Left alone you will get truly unstructured random mess.

People keep making trivial apps with open source examples thinking they found god. Another dismissive comment and I swear.

Because humans make design decisions, AI just bangs it's head against the problem until it gets something that "works".

Is it worth the effort to review until such implications are understood?

No of course not, bike shedding licenses is where it is at.

>incorporate knowledge carrying an illiberal license.

Good luck proving an LLM has "Knowledge", and isn't just a statistical model that tries to form outputs as a copy of it's training data...

> This obsession with copyrights between different free software ecosystems - who put the lawyers in charge?

This comment on the article is spot on. I don't vibe code or care about AI really, but it's so exhausting to see people playing lawyer in threads about LLM-generated code. No one knows, a ton of people are using LLMs, the companies behind these models torrented content themselves, and why would you spend your time defending copyright / use it as a tool to spread FUD? Copyright is a made up concept that exists to kill competition and protect those who suck at executing on ideas.

> Vibe-Coded Ext4 for OpenBSD

Who wants to test it ? Preferably on real hardware. /s

Paywalled article on something vibe-coded? That seems like a bold strategy.

click to continue

Well this is ironic, GPL advocate(s) declaring a clean implementation based on specifications infringing due to someone/something reading specs provided under license. Didn't Oracle lose that argument in court as pertains to Android implementation of Java libraries?

I'm not sure what you're reading; there is a distinct lack of GPL advocates in that conversation.