A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:
> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.
I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.
Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.
The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
I remember in the very first class I ever took on Web Design the teacher spent an entire semester teaching "first principles" of HTML, CSS and JavaScript by writing it in Notepad.
It was only then did she introduce us to the glory that was Adobe Dreamweaver, which (obviously) increased our productivity tenfold.
DreamWeaver absolutely destroyed the code with all kinds of tags and unnecessary stuff. Especially if you used the visual editor. It was fun for brainstorming but plain notepad with clean understandable code was far far better (and with the browser compatibility issues the only option if you were going to production).
The HTML generated by Dreamweaver's WYSIWYG mode might not have been ideal, but it was far superior to the mess produced by MS Front Page. With Dreamweave, it was at least possible to use it as a starting point.
After 25 or so years doing this, I think there are two kinds of developers: craftsmen and practical “does it get the job done” types. I’m the former. The latter seem to be what makes the world go round.
It takes both.
I miss Dreamweaver. Combining it with Fireworks was a crazy productive combo for me back in the mid 00’s!
My first PHP scripts and games were written using nothing more than Notepad too funnily enough
Back in the early 00s I brought gvim.exe on a floppy disk to school because I refused to write XSLT, HTML, CSS, etc without auto-indent or syntax highlighting.
> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
Years ago I had to spend many months building nothing but Models (as in MVC) for a huge data import / ingest the company I worked on was rewriting. It was just messy enough that it couldn't be automated. I almost lost my mind from the dull monotony and started even having attendance issues. I know today that could have been done with an LLM in minutes. Almost crazy how much time I put into that project compared to if I did it today.
The issue is that it might look good but an LLM often inserts weird mistakes.
Or ellipses.
Or overindex on the training data.
If someone is not careful it is easy to completely wreck the codebase by piling on seemingly innocuous commits.
So far I have developed a good sense for when I need to push the llm to avoid sloppy code. It is all in the details.
But a junior engineer would never find/anticipate those issues.
I am a bit concerned. Because the kind of software I am making, a llm would never prompt on its own.
A junior cannot make it, it requires research and programming experience that they do not have.
But I know that if I were a junior today, I would probably try to use llms as much as possible and would probably know less programming over time.
So it seems to me that we are likely to have worse software over time.
Perhaps a boon for senior engineers but how do we train junior devs in that environment? Force them to build slowly, without llms? Is it aligned with business incentives?
Do we create APIs expecting the code to be generated by LLMs or written by hand?
Because the impact of verbosity is not necessarily the same.
LLMs don't get tired as fast as humans.
> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
This gives me somewhat of a knee jerk reaction.
When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.
There was this long argument that you should know things and not have to look it up all the time. Altavista was a joke, and Google was cheating.
Then syntax highlighting came around and there'd always be a guy going "yeah nah, you shouldn't need syntax highlighting to program, you screen looks like a Christmas tree".
Then we got stuff like auto-complete, and it was amazing, the amount of keystrokes we saved. That too, was seen as heresy by the purists (followed later by LSP - which many today call heresy).
That reminds me also, back in the day, people would have entire Encyclopaedia on DVDs collections. Did they use it? No. But they criticised Wikipedia for being inferior. Look at today, though.
Same thing with LLMs. Whether you use them as a powerful context based auto-complete, as a research tool faster than wikipedia and google, as rubber-duck debugger, or as a text generator -- who cares: this is today, stop talking like a fossil.
It's 2025 and junior developers can't work without LSP and LLM? It's fine. They're not in front of a 386 DX33 with 1 book of K&R C and a blue EDIT screen. They have massive challenged ahead of them, the IT world is complete shambles, and it's impossible to decipher how anything is made, even open source.
Today is today. Use all the tools at hand. Don't shame kids for using the best tools.
We should be talking about sustainability of such tools rather than what it means to use them (cf. enshittification, open source models etc.)
It is not clear though, which tools enable and which tools inhibit your development at the beginning of your journey.
Agreed, although LLMs definitely qualify as enabling developers compared to <social media, Steam, consoles, and other distractions> of today.
The Internet itself is full of distractions. My younger self spent a crazy amount of time on IRC. So it's not different than spending time on say, Discord today.
LLMs have pretty much a direct relationship with Google. The quality of the response has much to do with the quality of the prompt. If anything, it's the overwhelming nature of LLMs that might be the problem. Back in the day, if you had, say a library access, the problem was knowing what to look for. Discoverability with LLMs is exponential.
As for LLM as auto-complete, there is an argument to be made that typing a lot reinforces knowledge in the human brain like writing. This is getting lost, but with productivity gains.
Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.
Tools like Claude code with ask/plan mode seem to be better in my experience, though I absolutely do wonder about the lack of typing causing a lack of memory formation
A rule I set myself a long time ago was to never copy paste code from stack overflow or similar websites. I always typed it out again. Slower, but I swear it built the comprehension I have today.
> Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.
That's not an LLM problem, they'd do the same thing 10 years ago with stack overflow: argue about which answer is best, or trust the answer blindly.
> but I swear it built the comprehension I have today.
For interns/junior engineers, the choice is: comprehension VS career.
And I won't be surprised if most of them will go with career now, and comprehension.. well thanks maybe tomorrow (or never).
I don’t think that’s the dichotomy. I’ve been in charge of hiring at a few companies, and comprehension is what I look for 10 times out of 10.
>"in my days, we had books and we remembered things" which of course is hilarious
it isn't hilarious, it's true. My father (now in his 60s) who came from a blue collar background with very little education taught himself programming by manually copying and editing software out of magazines, like a lot of people his age.
I teach students now who have access to all the information in the world but a lot of them are quite literally so scatterbrained and heedless anything that isn't catered to them they can't process. Not having working focus and memory is like having muscle atrophy of the mind, you just turn into a vegetable. Professors across disciplines have seen decline in student abilities, and for several decades now, not just due to LLMs.
> When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.
But I mean, you can get by without memorizing stuff sure, but memorizing stuff does work out your brain and does help out in the long run? Isn't it possible we've reached the cliff of "helpful" tools to the point we are atrophying enough to be worse at our jobs?
Like, reading is surely better for the brain than watching TV. But constant cable TV wasn't enough to ruin our brains. What if we've got to the point it finally is enough?
As usual with Oxide's RFDs, I found myself vigorously head-nodding while reading. Somewhat rarely, I found a part that I found myself disagreeing with:
> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.
Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.
Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".
I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.
I guess to follow up slightly more:
- I think the "if you use another model" rebuttal is becoming like the No True Scotsman of the LLM world. We can get concrete and discuss a specific model if need be.
- If the use case is "generate this function body for me", I agree that that's a pretty good use case. I've specifically seen problematic behavior for the other ways I'm seeing it OFTEN used, which is "write this feature for me", or trying to one shot too much functionality, where the LLM gets to touch data structures, abstractions, interface boundaries, etc.
- To analogize it to writing: They shouldn't/cannot write the whole book, they shouldn't/cannot write the table of contents, they cannot write a chapter, IMO even a paragraph is too much -- but if you write the first sentence and the last sentence of a paragraph, I think the interpolation can be a pretty reasonable starting point. Bringing it back to code for me means: function bodies are OK. Everything else gets questionable fast IME.
One difference is that clichéd prose is bad and clichéd code is generally good.
Depends on what your prose is for. If it's for documentation, then prose which matches the expected tone and form of other similar docs would be clichéd in this perspective. I think this is a really good use of LLMs - making docs consistent across a large library / codebase.
I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.
The “change capture”/straight jacket style tests LLMs like to output drive me nuts. But humans write those all the time too so I shouldn’t be that surprised either!
A problem I’ve found with LLMs for docs is that they are like ten times too wordy. They want to document every path and edge case rather focusing on what really matters.
It can be addressed with prompting, but you have to fight this constantly.
I think probably my most common prompt is "Make it shorter. No more than ($x) (words|sentences|paragraphs)."
Docs also often don’t have anyone’s name on them, in which case they’re already attributed to an unknown composite author.
Writing is an expression of an individual, while code is a tool used to solve a problem or achieve a purpose.
The more examples of different types of problems being solved in similar ways present in an LLM's dataset, the better it gets at solving problems. Generally speaking, if it's a solution that works well, it gets used a lot, so "good solutions" become well represented in the dataset.
Human expression, however, is diverse by definition. The expression of the human experience is the expression of a data point on a statistical field with standard deviations the size of chasms. An expression of the mean (which is what an LLM does) goes against why we care about human expression in the first place. "Interesting" is a value closely paired with "different".
We value diversity of thought in expression, but we value efficiency of problem solving for code.
There is definitely an argument to be made that LLM usage fundamentally restrains an individual from solving unsolved problems. It also doesn't consider the question of "where do we get more data from".
>the code you actually want to ship is so far from what LLMs write
I think this is a fairly common consensus, and my understanding is the reason for this issue is limited context window.
I argue that the intent of an engineer is contained coherently across the code of a project. I have yet to get an LLM to pick up on the deeper idioms present in a codebase that help constrain the overall solution towards these more particular patterns. I’m not talking about syntax or style, either. I’m talking about e.g. semantic connections within an object graph, understanding what sort of things belong in the data layer based on how it is intended to be read/written, etc. Even when I point it at a file and say, “Use the patterns you see there, with these small differences and a different target type,” I find that LLMs struggle. Until they can clear that hurdle without requiring me to restructure my entire engineering org they will remain as fancy code completion suggestions, hobby project accelerators, and not much else.
I recently published an internal memo which covered the same point, but I included code. I feel like you still have a "voice" in code, and it provides important cues to the reviewer. I also consider review to be an important learning and collaboration moment, which becomes difficult with LLM code.
> I think that the code you actually want to ship is so far from what LLMs write
It depends on the LLM, I think. A lot of people have a bad impression of them as a result of using cheap or outdated LLMs.
In my experience, LLMs have been quite capable of producing code I am satisfied with (though of course it depends on the context — I have much lower standards for one-off tools than long-lived apps). They are able to follow conventions already present in a codebase and produce something passable. Whereas with writing prose, I am almost never happy with the feel of what an LLM produces (worth noting that Sonnet and Opus 4.5’s prose may be moving up from disgusting to tolerable). I think of it as prose being higher-dimensional — for a given goal, often the way to express it in code is pretty obvious, and many developers would do essentially the same thing. Not so for prose.
> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.
My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:
1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project
2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.
3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.
4) I then tell it to generate the code
5) I skim & test the code to see if it's generally correct, and have it make corrections as needed
6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)
The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.
This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.
I've found that your step 6 takes the vast majority of the time I spend programming with LLMs. Like 10X+ the combined total of time steps 1-5 take. And that's if the code the LLM produced actually works. If it doesn't work (which happens quite often), then even more handholding and corrections are needed. It's really a grind. I'm still not sure whether I am net saving time using these tools.
I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?
You can have the tool start by writing an implementation plan describing the overall approach and key details including references, snippets of code, task list, etc. That is much faster than a raw diff to review and refine to make sure it matches your intent. Once that's acceptable the changes are quick, and having the machine do a few rounds of refinement to make sure the diff vs HEAD matches the plan helps iron out some of the easy issues before human eyes show up. The final review is then easier because you are only checking for smaller issues and consistency with the plan that you already signed off on.
It's not magic though, this still takes some time to do.
I exclusively use the autocomplete in cursor. I hate reviewing huge chunks of llm code at one time. With the autocomplete, I’m in full control of the larger design and am able to quickly review each piece of llm code. Very often it generates what I was going to type myself.
Anything that involves math or complicated conditions I take extra time on.
I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence
Don’t make manual corrections.
If you keep all edits to be driven by the LLM, you can use that knowledge later in the session or ask your model to commit the guidelines to long term memory.
I've had the same thought about 'written' text with an LLM. If you didn't spend time writing it don't expect me to read it. I'm glad he seems to be taking a hard stance on that saying they won't use LLMs to write non-code artifacts. This principle extends to writing code as well to some degree. You shouldn't expect other people to peer review 'your' code which was simply generated because, again, you spent no time making it. You have to be the first reviewer. Whether these cultural norms are held firmly remains to be seen (I don't work there), but I think they represent thoughtful application of emerging technologies.
> it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!)
This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).
See: Kernighan's Law
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
I think people misunderstand this quote. Cleverness in this context is referring to complexity, and generally stems from falling in love with some complex mechanism you dream up to solve a problem rather than challenging yourself to create something simpler and easier to maintain. Bolting together bits of LLM-created code is is far more likely to be “clever” rather than good.
What an amazing quote!
> LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well.
I think this points out a key point.. but I'm not sure the right way to articulate it.
A human-written comment may be worth something, but an LLM-generated is cheap/worthless.
The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".
It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.
I'll give it a shot.
Text, images, art, and music are all methods of expressing our internal ideas to other human beings. Our thoughts are the source, and these methods are how they are expressed. Our true goal in any form of communication is to understand the internal ideas of others.
An LLM expresses itself in all the same ways, but the source doesn't come from an individual - it comes from a giant dataset. This could be considered an expression of the aggregate thoughts of humanity, which is fine in some contexts (like retrieval of ideas and information highly represented in the data/world), but not when presented in a context of expressing the thoughts of an individual.
LLMs express the statistical summation of everyone's thoughts. It presents the mean, when what we're really interested in are the data points a couple standard deviations away from the mean. That's where all the interesting, unique, and thought provoking ideas are. Diversity is a core of the human experience.
---
An interesting paradox is the use of LLMs for translation into a non-native language. LLMs are actively being used to better express an individual's ideas using words better than they can with their limited language proficiency, but for those of us on the receiving end, we interpret the expression to mirror the source and have immediate suspicions on the legitimacy of the individual's thoughts. Which is a little unfortunate for those who just want to express themselves better.
I think more people should read Naur's "programming as theory building".
A comment is an attempt to more fully document the theory the programmer has. Not all theory can be expressed in code. Both code and comment are lossy artefacts that are "projections" of the theory into text.
LLMs currently, I believe, cannot have a theory of the program. But they can definitely perform a useful simulacrum of such. I have not yet seen an LLM generated comment that is truly valuable. Of course, lots of human generated comments are not valuable either. But the ceiling for human comments is much, much higher.
> assurance that the model will not use the document to train future iterations of itself.
believing this in 2025 is really fascinating. this is like believing Meta won’t use info they (i)legally collected about you to serve you ads
I would have expected at least some consideration of public perception, given the extremely negative opinions many people hold about LLMs being trained on stolen data. Whether it's an ethical issue or a brand hazard depends on your opinions about that, but it's definitely at least one of those currently.
I made the mistake of first reading this as a document intended for all in spite of it being public.
This is a technical document that is useful in illustrating how the guy who gave a talk once that I didn’t understand but was captivated by and is well-respected in his field intends to guide his company’s use of the technology so that other companies and individual programmers may learn from it too.
I don’t think the objective was to take any outright ethical stance, but to provide guidance about something ostensibly used at an employee’s discretion.
He speaks of trust and LLMs breaking that trust. Is this not what you mean, but by another name?
> First, to those who can recognize an LLM’s reveals (an expanding demographic!), it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open. But there are deeper problems: LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well. If the prose is automatically generated, might the ideas be too? The reader can’t be sure — and increasingly, the hallmarks of LLM generation cause readers to turn off (or worse).
> Specifically, we must be careful to not use LLMs in such a way as to undermine the trust that we have in one another
> our writing is an important vessel for building trust — and that trust can be quickly eroded if we are not speaking with our own voice
The guide is generally very well thought, but I see an issue in this part:
It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.
I find two problems with this:
- there is incoherence there. If LLMs are flawless in reading and summarization, there is no difference with reading the original. And if they aren’t flawless, then that flaw also extends to non social stuff.
- in practice, I haven’t found LLMs so good as reading assistants. I’ve send them to check a linked doc and they’ve just read the index and inferred the context, for example. Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.
There is a significant risk in placing a translation layer between content and reader.
> Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.
I would consider this a failure in their tool use capabilities, not their reading ones.
To use them to read things (without relying on their much less reliable tool use) take the thing and put it in the context window yourself.
They still aren't perfect of course, but they are reasonably good.
Three whole books likely exceeds their context window size of course, I'd take this as a sign that they aren't up to a task of that magnitude yet.
I wonder if they would be willing to publish the "LLMs at Oxide" advice, linked in the OP [1], but currently publicly inaccessible.
> Ironically, LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation
Is there any evidence for this?
>Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it
By this own article's standards, now there are 2 authors who don't understand what they've produced.
Nobody has yet to explain how an LLM can be better than a well paid human expert.
A well paid human expert can find lots of uses of LLMs. I'm still not convinced that humans will ever be totally replaced, and what work will look like is human experts using LLMs as another tool in the toolbox, just like how an engineer would have used a slide rule or mechanical calculator back in the day. The kind of work they're good at doesn't cover the full range of necessary engineering tasks, but they do open up new avenues. For instance, yesterday I was able to get the basic gist of three solutions for a pretty complex task in about an hour. The result of that was me seeing that two of them were unlikely to work for what I'm doing, so that now I can invest actual effort in the third solution.
The not needing to pay it well.
[deleted]
The empathy section is quite interesting
> When debugging a vexing problem one has little to lose by using an LLM — but perhaps also little to gain.
This probably doesn't give them enough credit. If you can feed an LLM a list of crash dumps it can do a remarkable job producing both analyses and fixes. And I don't mean just for super obvious crashes. I was most impressed with a deadlock where numerous engineers and tried and failed to understand exactly how to fix it.
LLMs are good where there is a lot of detail but the answer to be found is simple.
This is sort of the opposite of vibe coding, but LLMs are OK at that too.
> LLMs are good where there is a lot of detail but the answer to be found is simple.
Oooo I like that. Will try and remember that one.
Amusingly, my experience is that the longer an issue takes me to debug the simpler and dumber the fix is. It's tragic really.
Hmmm, I'm a bit confused of their conclusions (encouraging use) given some of the really damning caveats they point out. A tool they themselves determine to need such careful oversight probably just shouldn't be used near prod at all.
For the same quality and quantity output, if the cost of using LLMs + the cost of careful oversight is less than the cost of not using LLMs then the rational choice is to use them.
Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.
It seems like this would be a really interesting field to research. Does AI assisted coding result in fewer bugs, or more bugs, vs an unassisted human?
I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".
My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.
I'm not sure about research, but I've used LLMs for a few things here at Oxide with (what I hope is) appropriate judgment.
I'm currently trying out using Opus 4.5 to take care of a gnarly code reorganization that would take a human most of a week to do -- I spent a day writing a spec (by hand, with some editing advice from Claude Code), having it reviewed as a document for humans by humans, and feeding it into Opus 4.5 on some test cases. It seems to work well. The spec is, of course, in the form of an RFD, which I hope to make public soon.
I like to think of the spec is basically an extremely advanced sed script described in ~1000 English words.
Maybe it's not as necessary with a codebase as well-organized as Oxide's, but I found gemini 3 useful for a refactor of some completely test-free ML research code, recently. I got it to generate a test case which would exercise all the code subject to refactoring, got it to do the refactoring and verify that it leads to exactly the same state, then finally got it to randomize the test inputs and keep repeating the comparison.
And it doesn't factor seniority/experience. What's good for a senior developer is not necessarily same for a beginner
Medication is littered with warning labels but humans still use it to combat illness. Social media can harm mental health yet people still use it. Pick whatever other example you'd like.
There are things in life that have high risks of harm if misused yet people still use them because there are great benefits when carefully used. Being aware of the risks is the key to using something that can be harmful, safely.
I would think some of their engineers love using LLMs, it would be unfair to them to completely disallow it IMO (even as someone who hates LLMs)
Junior engineers are the usual comparison folks make to LLMs, which is apt as juniors need lots of oversight.
There’s a lot of code that doesn’t hit prod.
What do you find confusing about the document encouraging use of LLMs?
The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".
The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.
Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.
The ultimate conclusion seems to be one that leaves it to personal responsibility - the user of the LLM is responsible for ensuring the LLM has done its job correctly. While this is the ethical conclusion to me, but the “gap” left to personal responsibility is so large that it makes me question how useful everything else in this document really is.
I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.
Cantrill jumps on every bandwagon. When he assisted in cancelling a Node developer (not a native English speaker) over pronouns he was following the Zeitgeist, now "Broadly speaking, LLM use is encouraged at Oxide."
Find it interesting that the section about LLM’s tells when using it for writing is absolutely littered with emdashes
To be fair, LLMs usually use em-dashes correctly, whereas I think this document misuses them more often than not. For example:
> This can be extraordinarily powerful for summarizing documents — or of answering more specific questions of a large document like a datasheet or specification.
That dash shouldn't be there. That's not a parenthetical clause, that's an element in a list separated by "or." You can just remove the dash and the sentence becomes more correct.
LLMs also generally don't put spaces around em dashes — but a lot of human writers do.
I don't know whether that use of the em-dash is grammatically correct, but I've seen enough native English writers use it like that. One example is Philip K Dick.
You can stop LLMs from using em-dashes by just telling it to "never use em-dashes". This same type of prompt engineering works to mitigate almost every sign of AI-generated writing, which is one reason why AI writing heuristics/detectors can never be fully reliable.
This does not work on Bryan, however.
I believe Bryan is a well known em dash addict
>I believe Bryan is a well known em dash addict
I was hoping he'd make the leaderboard, but perhaps the addiction took proper hold in more recent years:
And I mean no disrespect to him for it, it’s just kind of funny
There was a comment recently by HN's most enthusiastic LLM cheerleader, Simon Willison, that I stopped reading almost immediately (before seeing who posted it), because it exuded the slop stench of an LLM: https://news.ycombinator.com/item?id=46011877
A measured, comprehensive, and sensible take. Not surprising from Bryan. This was a nice line:
> it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open.
I think Oxide didn't include this in the RFD because they exclusively hire senior engineers, but in an organization that contains junior engineers I'd add something specific to help junior engineers understand how they should approach LLM use.
Bryan has 30+ years of challenging software (and now hardware) engineering experience. He memorably said that he's worked on and completed a "hard program" (an OS), which he defines as a program you doubt you can actually get working.
The way Bryan approaches an LLM is super different to how a 2025 junior engineer does so. That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
I remember in the very first class I ever took on Web Design the teacher spent an entire semester teaching "first principles" of HTML, CSS and JavaScript by writing it in Notepad.
It was only then did she introduce us to the glory that was Adobe Dreamweaver, which (obviously) increased our productivity tenfold.
DreamWeaver absolutely destroyed the code with all kinds of tags and unnecessary stuff. Especially if you used the visual editor. It was fun for brainstorming but plain notepad with clean understandable code was far far better (and with the browser compatibility issues the only option if you were going to production).
The HTML generated by Dreamweaver's WYSIWYG mode might not have been ideal, but it was far superior to the mess produced by MS Front Page. With Dreamweave, it was at least possible to use it as a starting point.
After 25 or so years doing this, I think there are two kinds of developers: craftsmen and practical “does it get the job done” types. I’m the former. The latter seem to be what makes the world go round.
It takes both.
I miss Dreamweaver. Combining it with Fireworks was a crazy productive combo for me back in the mid 00’s!
My first PHP scripts and games were written using nothing more than Notepad too funnily enough
Back in the early 00s I brought gvim.exe on a floppy disk to school because I refused to write XSLT, HTML, CSS, etc without auto-indent or syntax highlighting.
> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
Years ago I had to spend many months building nothing but Models (as in MVC) for a huge data import / ingest the company I worked on was rewriting. It was just messy enough that it couldn't be automated. I almost lost my mind from the dull monotony and started even having attendance issues. I know today that could have been done with an LLM in minutes. Almost crazy how much time I put into that project compared to if I did it today.
The issue is that it might look good but an LLM often inserts weird mistakes. Or ellipses. Or overindex on the training data. If someone is not careful it is easy to completely wreck the codebase by piling on seemingly innocuous commits. So far I have developed a good sense for when I need to push the llm to avoid sloppy code. It is all in the details.
But a junior engineer would never find/anticipate those issues.
I am a bit concerned. Because the kind of software I am making, a llm would never prompt on its own. A junior cannot make it, it requires research and programming experience that they do not have. But I know that if I were a junior today, I would probably try to use llms as much as possible and would probably know less programming over time.
So it seems to me that we are likely to have worse software over time. Perhaps a boon for senior engineers but how do we train junior devs in that environment? Force them to build slowly, without llms? Is it aligned with business incentives?
Do we create APIs expecting the code to be generated by LLMs or written by hand? Because the impact of verbosity is not necessarily the same. LLMs don't get tired as fast as humans.
> That junior engineer possibly hasn't programmed without the tantalizing, even desperately tempting option to be assisted by an LLM.
This gives me somewhat of a knee jerk reaction.
When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.
There was this long argument that you should know things and not have to look it up all the time. Altavista was a joke, and Google was cheating.
Then syntax highlighting came around and there'd always be a guy going "yeah nah, you shouldn't need syntax highlighting to program, you screen looks like a Christmas tree".
Then we got stuff like auto-complete, and it was amazing, the amount of keystrokes we saved. That too, was seen as heresy by the purists (followed later by LSP - which many today call heresy).
That reminds me also, back in the day, people would have entire Encyclopaedia on DVDs collections. Did they use it? No. But they criticised Wikipedia for being inferior. Look at today, though.
Same thing with LLMs. Whether you use them as a powerful context based auto-complete, as a research tool faster than wikipedia and google, as rubber-duck debugger, or as a text generator -- who cares: this is today, stop talking like a fossil.
It's 2025 and junior developers can't work without LSP and LLM? It's fine. They're not in front of a 386 DX33 with 1 book of K&R C and a blue EDIT screen. They have massive challenged ahead of them, the IT world is complete shambles, and it's impossible to decipher how anything is made, even open source.
Today is today. Use all the tools at hand. Don't shame kids for using the best tools.
We should be talking about sustainability of such tools rather than what it means to use them (cf. enshittification, open source models etc.)
It is not clear though, which tools enable and which tools inhibit your development at the beginning of your journey.
Agreed, although LLMs definitely qualify as enabling developers compared to <social media, Steam, consoles, and other distractions> of today.
The Internet itself is full of distractions. My younger self spent a crazy amount of time on IRC. So it's not different than spending time on say, Discord today.
LLMs have pretty much a direct relationship with Google. The quality of the response has much to do with the quality of the prompt. If anything, it's the overwhelming nature of LLMs that might be the problem. Back in the day, if you had, say a library access, the problem was knowing what to look for. Discoverability with LLMs is exponential.
As for LLM as auto-complete, there is an argument to be made that typing a lot reinforces knowledge in the human brain like writing. This is getting lost, but with productivity gains.
Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.
Tools like Claude code with ask/plan mode seem to be better in my experience, though I absolutely do wonder about the lack of typing causing a lack of memory formation
A rule I set myself a long time ago was to never copy paste code from stack overflow or similar websites. I always typed it out again. Slower, but I swear it built the comprehension I have today.
> Watching my juniors constantly fight the nonsense auto completion suggestions their LLM editor of choice put in front of them, or worse watching them accept it and proceed to get entirely lost in the sauce, I’m not entirely convinced that the autocompletion part of it is the best one.
That's not an LLM problem, they'd do the same thing 10 years ago with stack overflow: argue about which answer is best, or trust the answer blindly.
> but I swear it built the comprehension I have today.
For interns/junior engineers, the choice is: comprehension VS career.
And I won't be surprised if most of them will go with career now, and comprehension.. well thanks maybe tomorrow (or never).
I don’t think that’s the dichotomy. I’ve been in charge of hiring at a few companies, and comprehension is what I look for 10 times out of 10.
>"in my days, we had books and we remembered things" which of course is hilarious
it isn't hilarious, it's true. My father (now in his 60s) who came from a blue collar background with very little education taught himself programming by manually copying and editing software out of magazines, like a lot of people his age.
I teach students now who have access to all the information in the world but a lot of them are quite literally so scatterbrained and heedless anything that isn't catered to them they can't process. Not having working focus and memory is like having muscle atrophy of the mind, you just turn into a vegetable. Professors across disciplines have seen decline in student abilities, and for several decades now, not just due to LLMs.
> When I started programming professionally in the 90s, the internet came of age and I remember being told "in my days, we had books and we remembered things" which of course is hilarious because today you can't possibly retain ALL the knowledge needed to be software engineer due to the sheer size of knowledge required today to produce a meaningful product. It's too big and it moves too fast.
But I mean, you can get by without memorizing stuff sure, but memorizing stuff does work out your brain and does help out in the long run? Isn't it possible we've reached the cliff of "helpful" tools to the point we are atrophying enough to be worse at our jobs?
Like, reading is surely better for the brain than watching TV. But constant cable TV wasn't enough to ruin our brains. What if we've got to the point it finally is enough?
As usual with Oxide's RFDs, I found myself vigorously head-nodding while reading. Somewhat rarely, I found a part that I found myself disagreeing with:
> Unlike prose, however (which really should be handed in a polished form to an LLM to maximize the LLM’s efficacy), LLMs can be quite effective writing code de novo.
Don't the same arguments against using LLMs to write one's prose also apply to code? Was this structure of the code and ideas within the engineers'? Or was it from the LLM? And so on.
Before I'm misunderstood as a LLM minimalist, I want to say that I think they're incredibly good at solving for the blank page syndrome -- just getting a starting point on the page is useful. But I think that the code you actually want to ship is so far from what LLMs write, that I think of it more as a crutch for blank page syndrome than "they're good at writing code de novo".
I'm open to being wrong and want to hear any discussion on the matter. My worry is that this is another one of the "illusion of progress" traps, similar to the one that currently fools people with the prose side of things.
I guess to follow up slightly more:
- I think the "if you use another model" rebuttal is becoming like the No True Scotsman of the LLM world. We can get concrete and discuss a specific model if need be.
- If the use case is "generate this function body for me", I agree that that's a pretty good use case. I've specifically seen problematic behavior for the other ways I'm seeing it OFTEN used, which is "write this feature for me", or trying to one shot too much functionality, where the LLM gets to touch data structures, abstractions, interface boundaries, etc.
- To analogize it to writing: They shouldn't/cannot write the whole book, they shouldn't/cannot write the table of contents, they cannot write a chapter, IMO even a paragraph is too much -- but if you write the first sentence and the last sentence of a paragraph, I think the interpolation can be a pretty reasonable starting point. Bringing it back to code for me means: function bodies are OK. Everything else gets questionable fast IME.
One difference is that clichéd prose is bad and clichéd code is generally good.
Depends on what your prose is for. If it's for documentation, then prose which matches the expected tone and form of other similar docs would be clichéd in this perspective. I think this is a really good use of LLMs - making docs consistent across a large library / codebase.
I have been testing agentic coding with Claude 4.5 Opus and the problem is that it's too good at documentation and test cases. It's thorough in a way that it goes out of scope, so I have to edit it down to increase the signal-to-noise.
The “change capture”/straight jacket style tests LLMs like to output drive me nuts. But humans write those all the time too so I shouldn’t be that surprised either!
A problem I’ve found with LLMs for docs is that they are like ten times too wordy. They want to document every path and edge case rather focusing on what really matters.
It can be addressed with prompting, but you have to fight this constantly.
I think probably my most common prompt is "Make it shorter. No more than ($x) (words|sentences|paragraphs)."
Docs also often don’t have anyone’s name on them, in which case they’re already attributed to an unknown composite author.
Writing is an expression of an individual, while code is a tool used to solve a problem or achieve a purpose.
The more examples of different types of problems being solved in similar ways present in an LLM's dataset, the better it gets at solving problems. Generally speaking, if it's a solution that works well, it gets used a lot, so "good solutions" become well represented in the dataset.
Human expression, however, is diverse by definition. The expression of the human experience is the expression of a data point on a statistical field with standard deviations the size of chasms. An expression of the mean (which is what an LLM does) goes against why we care about human expression in the first place. "Interesting" is a value closely paired with "different".
We value diversity of thought in expression, but we value efficiency of problem solving for code.
There is definitely an argument to be made that LLM usage fundamentally restrains an individual from solving unsolved problems. It also doesn't consider the question of "where do we get more data from".
>the code you actually want to ship is so far from what LLMs write
I think this is a fairly common consensus, and my understanding is the reason for this issue is limited context window.
I argue that the intent of an engineer is contained coherently across the code of a project. I have yet to get an LLM to pick up on the deeper idioms present in a codebase that help constrain the overall solution towards these more particular patterns. I’m not talking about syntax or style, either. I’m talking about e.g. semantic connections within an object graph, understanding what sort of things belong in the data layer based on how it is intended to be read/written, etc. Even when I point it at a file and say, “Use the patterns you see there, with these small differences and a different target type,” I find that LLMs struggle. Until they can clear that hurdle without requiring me to restructure my entire engineering org they will remain as fancy code completion suggestions, hobby project accelerators, and not much else.
I recently published an internal memo which covered the same point, but I included code. I feel like you still have a "voice" in code, and it provides important cues to the reviewer. I also consider review to be an important learning and collaboration moment, which becomes difficult with LLM code.
> I think that the code you actually want to ship is so far from what LLMs write
It depends on the LLM, I think. A lot of people have a bad impression of them as a result of using cheap or outdated LLMs.
In my experience, LLMs have been quite capable of producing code I am satisfied with (though of course it depends on the context — I have much lower standards for one-off tools than long-lived apps). They are able to follow conventions already present in a codebase and produce something passable. Whereas with writing prose, I am almost never happy with the feel of what an LLM produces (worth noting that Sonnet and Opus 4.5’s prose may be moving up from disgusting to tolerable). I think of it as prose being higher-dimensional — for a given goal, often the way to express it in code is pretty obvious, and many developers would do essentially the same thing. Not so for prose.
> Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it. Moreover, once in the loop of peer review, generation should more or less be removed: if code review comments are addressed by wholesale re-generation, iterative review becomes impossible.
My general procedure for using an LLM to write code, which is in the spirit of what is advocated here, is:
1) First, feed in the existing relevant code into an LLM. This is usually just a few source files in a larger project
2) Describe what I want to do, either giving an architecture or letting the LLM generate one. I tell it to not write code at this point.
3) Let it speak about the plan, and make sure that I like it. I will converse to address any deficiencies that I see, and I almost always do.
4) I then tell it to generate the code
5) I skim & test the code to see if it's generally correct, and have it make corrections as needed
6) Closely read the entire generated artifact at this point, and make manual corrections (occasionally automatic corrections like "replace all C style casts with the appropriate C++ style casts" then a review of the diff)
The hardest part for me is #6, where I feel a strong emotional bias towards not doing it, since I am not yet aware of any errors compelling such action.
This allows me to operate at a higher level of abstraction (architecture) and remove the drudgery of turning an architectural idea into written, precise, code. But, when doing so, you are abandoning those details to a non-deterministic system. This is different from, for example, using a compiler or higher level VM language. With these other tools, you can understand how they work and rapidly have a good idea of what you're going to get, and you have robust assurances. Understanding LLMs helps, but thus not to the same degree.
I've found that your step 6 takes the vast majority of the time I spend programming with LLMs. Like 10X+ the combined total of time steps 1-5 take. And that's if the code the LLM produced actually works. If it doesn't work (which happens quite often), then even more handholding and corrections are needed. It's really a grind. I'm still not sure whether I am net saving time using these tools.
I always wonder about the people who say LLMs save them so much time: Do you just accept the edits they make without reviewing each and every line?
You can have the tool start by writing an implementation plan describing the overall approach and key details including references, snippets of code, task list, etc. That is much faster than a raw diff to review and refine to make sure it matches your intent. Once that's acceptable the changes are quick, and having the machine do a few rounds of refinement to make sure the diff vs HEAD matches the plan helps iron out some of the easy issues before human eyes show up. The final review is then easier because you are only checking for smaller issues and consistency with the plan that you already signed off on.
It's not magic though, this still takes some time to do.
I exclusively use the autocomplete in cursor. I hate reviewing huge chunks of llm code at one time. With the autocomplete, I’m in full control of the larger design and am able to quickly review each piece of llm code. Very often it generates what I was going to type myself.
Anything that involves math or complicated conditions I take extra time on.
I feel I’m getting code written 2 to 3 times faster this way while maintaining high quality and confidence
Don’t make manual corrections.
If you keep all edits to be driven by the LLM, you can use that knowledge later in the session or ask your model to commit the guidelines to long term memory.
I've had the same thought about 'written' text with an LLM. If you didn't spend time writing it don't expect me to read it. I'm glad he seems to be taking a hard stance on that saying they won't use LLMs to write non-code artifacts. This principle extends to writing code as well to some degree. You shouldn't expect other people to peer review 'your' code which was simply generated because, again, you spent no time making it. You have to be the first reviewer. Whether these cultural norms are held firmly remains to be seen (I don't work there), but I think they represent thoughtful application of emerging technologies.
> it is presumed that of the reader and the writer, it is the writer that has undertaken the greater intellectual exertion. (That is, it is more work to write than to read!)
This applies to natural language, but, interestingly, the opposite is true of code (in my experience and that of other people that I've discussed it with).
See: Kernighan's Law
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
https://www.laws-of-software.com/laws/kernighan/
I think people misunderstand this quote. Cleverness in this context is referring to complexity, and generally stems from falling in love with some complex mechanism you dream up to solve a problem rather than challenging yourself to create something simpler and easier to maintain. Bolting together bits of LLM-created code is is far more likely to be “clever” rather than good.
What an amazing quote!
> LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well.
I think this points out a key point.. but I'm not sure the right way to articulate it.
A human-written comment may be worth something, but an LLM-generated is cheap/worthless.
The nicest phrase capturing the thought I saw was: "I'd rather read the prompt".
It's probably just as good to let an LLM generate it again, as it is to publish something written by an LLM.
I'll give it a shot.
Text, images, art, and music are all methods of expressing our internal ideas to other human beings. Our thoughts are the source, and these methods are how they are expressed. Our true goal in any form of communication is to understand the internal ideas of others.
An LLM expresses itself in all the same ways, but the source doesn't come from an individual - it comes from a giant dataset. This could be considered an expression of the aggregate thoughts of humanity, which is fine in some contexts (like retrieval of ideas and information highly represented in the data/world), but not when presented in a context of expressing the thoughts of an individual.
LLMs express the statistical summation of everyone's thoughts. It presents the mean, when what we're really interested in are the data points a couple standard deviations away from the mean. That's where all the interesting, unique, and thought provoking ideas are. Diversity is a core of the human experience.
---
An interesting paradox is the use of LLMs for translation into a non-native language. LLMs are actively being used to better express an individual's ideas using words better than they can with their limited language proficiency, but for those of us on the receiving end, we interpret the expression to mirror the source and have immediate suspicions on the legitimacy of the individual's thoughts. Which is a little unfortunate for those who just want to express themselves better.
I think more people should read Naur's "programming as theory building".
A comment is an attempt to more fully document the theory the programmer has. Not all theory can be expressed in code. Both code and comment are lossy artefacts that are "projections" of the theory into text.
LLMs currently, I believe, cannot have a theory of the program. But they can definitely perform a useful simulacrum of such. I have not yet seen an LLM generated comment that is truly valuable. Of course, lots of human generated comments are not valuable either. But the ceiling for human comments is much, much higher.
> assurance that the model will not use the document to train future iterations of itself.
believing this in 2025 is really fascinating. this is like believing Meta won’t use info they (i)legally collected about you to serve you ads
I would have expected at least some consideration of public perception, given the extremely negative opinions many people hold about LLMs being trained on stolen data. Whether it's an ethical issue or a brand hazard depends on your opinions about that, but it's definitely at least one of those currently.
I made the mistake of first reading this as a document intended for all in spite of it being public.
This is a technical document that is useful in illustrating how the guy who gave a talk once that I didn’t understand but was captivated by and is well-respected in his field intends to guide his company’s use of the technology so that other companies and individual programmers may learn from it too.
I don’t think the objective was to take any outright ethical stance, but to provide guidance about something ostensibly used at an employee’s discretion.
He speaks of trust and LLMs breaking that trust. Is this not what you mean, but by another name?
> First, to those who can recognize an LLM’s reveals (an expanding demographic!), it’s just embarrassing — it’s as if the writer is walking around with their intellectual fly open. But there are deeper problems: LLM-generated writing undermines the authenticity of not just one’s writing but of the thinking behind it as well. If the prose is automatically generated, might the ideas be too? The reader can’t be sure — and increasingly, the hallmarks of LLM generation cause readers to turn off (or worse).
> Specifically, we must be careful to not use LLMs in such a way as to undermine the trust that we have in one another
> our writing is an important vessel for building trust — and that trust can be quickly eroded if we are not speaking with our own voice
The guide is generally very well thought, but I see an issue in this part:
It sets the rule that things must be actually read when there’s a social expectation (code interviews for example) but otherwise… remarks that use of LLMs to assist comprehension has little downside.
I find two problems with this:
- there is incoherence there. If LLMs are flawless in reading and summarization, there is no difference with reading the original. And if they aren’t flawless, then that flaw also extends to non social stuff.
- in practice, I haven’t found LLMs so good as reading assistants. I’ve send them to check a linked doc and they’ve just read the index and inferred the context, for example. Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.
There is a significant risk in placing a translation layer between content and reader.
> Just yesterday I asked for a comparison of three technical books on a similar topic, and it wrongly guessed the third one rather than follow the three links.
I would consider this a failure in their tool use capabilities, not their reading ones.
To use them to read things (without relying on their much less reliable tool use) take the thing and put it in the context window yourself.
They still aren't perfect of course, but they are reasonably good.
Three whole books likely exceeds their context window size of course, I'd take this as a sign that they aren't up to a task of that magnitude yet.
I wonder if they would be willing to publish the "LLMs at Oxide" advice, linked in the OP [1], but currently publicly inaccessible.
[1] https://github.com/oxidecomputer/meta/tree/master/engineerin...
> Ironically, LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation
Is there any evidence for this?
>Wherever LLM-generated code is used, it becomes the responsibility of the engineer. As part of this process of taking responsibility, self-review becomes essential: LLM-generated code should not be reviewed by others if the responsible engineer has not themselves reviewed it
By this own article's standards, now there are 2 authors who don't understand what they've produced.
Nobody has yet to explain how an LLM can be better than a well paid human expert.
A well paid human expert can find lots of uses of LLMs. I'm still not convinced that humans will ever be totally replaced, and what work will look like is human experts using LLMs as another tool in the toolbox, just like how an engineer would have used a slide rule or mechanical calculator back in the day. The kind of work they're good at doesn't cover the full range of necessary engineering tasks, but they do open up new avenues. For instance, yesterday I was able to get the basic gist of three solutions for a pretty complex task in about an hour. The result of that was me seeing that two of them were unlikely to work for what I'm doing, so that now I can invest actual effort in the third solution.
The not needing to pay it well.
The empathy section is quite interesting
> When debugging a vexing problem one has little to lose by using an LLM — but perhaps also little to gain.
This probably doesn't give them enough credit. If you can feed an LLM a list of crash dumps it can do a remarkable job producing both analyses and fixes. And I don't mean just for super obvious crashes. I was most impressed with a deadlock where numerous engineers and tried and failed to understand exactly how to fix it.
LLMs are good where there is a lot of detail but the answer to be found is simple.
This is sort of the opposite of vibe coding, but LLMs are OK at that too.
> LLMs are good where there is a lot of detail but the answer to be found is simple.
Oooo I like that. Will try and remember that one.
Amusingly, my experience is that the longer an issue takes me to debug the simpler and dumber the fix is. It's tragic really.
Hmmm, I'm a bit confused of their conclusions (encouraging use) given some of the really damning caveats they point out. A tool they themselves determine to need such careful oversight probably just shouldn't be used near prod at all.
For the same quality and quantity output, if the cost of using LLMs + the cost of careful oversight is less than the cost of not using LLMs then the rational choice is to use them.
Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.
It seems like this would be a really interesting field to research. Does AI assisted coding result in fewer bugs, or more bugs, vs an unassisted human?
I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".
My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.
I'm not sure about research, but I've used LLMs for a few things here at Oxide with (what I hope is) appropriate judgment.
I'm currently trying out using Opus 4.5 to take care of a gnarly code reorganization that would take a human most of a week to do -- I spent a day writing a spec (by hand, with some editing advice from Claude Code), having it reviewed as a document for humans by humans, and feeding it into Opus 4.5 on some test cases. It seems to work well. The spec is, of course, in the form of an RFD, which I hope to make public soon.
I like to think of the spec is basically an extremely advanced sed script described in ~1000 English words.
Maybe it's not as necessary with a codebase as well-organized as Oxide's, but I found gemini 3 useful for a refactor of some completely test-free ML research code, recently. I got it to generate a test case which would exercise all the code subject to refactoring, got it to do the refactoring and verify that it leads to exactly the same state, then finally got it to randomize the test inputs and keep repeating the comparison.
And it doesn't factor seniority/experience. What's good for a senior developer is not necessarily same for a beginner
Medication is littered with warning labels but humans still use it to combat illness. Social media can harm mental health yet people still use it. Pick whatever other example you'd like.
There are things in life that have high risks of harm if misused yet people still use them because there are great benefits when carefully used. Being aware of the risks is the key to using something that can be harmful, safely.
I would think some of their engineers love using LLMs, it would be unfair to them to completely disallow it IMO (even as someone who hates LLMs)
Junior engineers are the usual comparison folks make to LLMs, which is apt as juniors need lots of oversight.
There’s a lot of code that doesn’t hit prod.
What do you find confusing about the document encouraging use of LLMs?
The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".
The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.
Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.
The ultimate conclusion seems to be one that leaves it to personal responsibility - the user of the LLM is responsible for ensuring the LLM has done its job correctly. While this is the ethical conclusion to me, but the “gap” left to personal responsibility is so large that it makes me question how useful everything else in this document really is.
I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.
Cantrill jumps on every bandwagon. When he assisted in cancelling a Node developer (not a native English speaker) over pronouns he was following the Zeitgeist, now "Broadly speaking, LLM use is encouraged at Oxide."
He is a long way from Sun.
For those interested, here's a take from Bryan after that incident https://bcantrill.dtrace.org/2013/11/30/the-power-of-a-prono...
Find it interesting that the section about LLM’s tells when using it for writing is absolutely littered with emdashes
To be fair, LLMs usually use em-dashes correctly, whereas I think this document misuses them more often than not. For example:
> This can be extraordinarily powerful for summarizing documents — or of answering more specific questions of a large document like a datasheet or specification.
That dash shouldn't be there. That's not a parenthetical clause, that's an element in a list separated by "or." You can just remove the dash and the sentence becomes more correct.
LLMs also generally don't put spaces around em dashes — but a lot of human writers do.
I don't know whether that use of the em-dash is grammatically correct, but I've seen enough native English writers use it like that. One example is Philip K Dick.
You can stop LLMs from using em-dashes by just telling it to "never use em-dashes". This same type of prompt engineering works to mitigate almost every sign of AI-generated writing, which is one reason why AI writing heuristics/detectors can never be fully reliable.
This does not work on Bryan, however.
I believe Bryan is a well known em dash addict
>I believe Bryan is a well known em dash addict
I was hoping he'd make the leaderboard, but perhaps the addiction took proper hold in more recent years:
https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo...
https://news.ycombinator.com/user?id=bcantrill
No doubt his em dashes are legit, of course.
And I mean no disrespect to him for it, it’s just kind of funny
There was a comment recently by HN's most enthusiastic LLM cheerleader, Simon Willison, that I stopped reading almost immediately (before seeing who posted it), because it exuded the slop stench of an LLM: https://news.ycombinator.com/item?id=46011877
However, I was surprised to see that when someone (not me) accused him of using an LLM to write his comment, he flatly denied it: https://news.ycombinator.com/item?id=46011964
Which I guess means (assuming he isn't lying) if you spend too much time interacting with LLMs, you eventually resemble one.
Based on paragraph length, I would assume that "LLMs as writers" is the most extensive use case.
The problem with this text is it's a written anecdote. Could all be fake.
I disagree with LLM's as Editors. The about of — in the post is crazy.