Minions: Stripe’s one-shot, end-to-end coding agents

Submitted five times so far: https://hn.algolia.com/?dateRange=pastWeek&page=0&prefix=tru...

Once with substantial discussions: https://news.ycombinator.com/item?id=47086557 (127 points | 2 days ago | 65 comments)

This is what happens when you create an environment where every staff engineer believes they need to show impact with AI to protect their career and act like they're experts in it even though they are learning how best to use AI at exactly the same time as every senior and jr engineer at their company and probably actually know just as much as a random hobbyist in university about what works well.

A bunch of people needed to make Geocities sites before we got to Gmail

"1000 PRs/week" with no breakdown of complexity or value is a vanity metric. If these are mostly migrations, boilerplate, and bug fixes on previous Minion PRs that were bug ridden, then you've just created 1000 code reviews/week to waste human time rubber-stamping. That's not productivity, that's busywork with extra steps.

It's like measuring productivity by how many people you pull into meetings each week. The CIA's Simple Sabotage Field Manual literally recommends holding as many meetings as possible with as many people as possible. The CIA should add "open as many PRs with AI as possible" to their list. Bonus sabotage points if the PRs are made from ambiguous "one-shot" attempts described in Slack with no follow up clarification.

I've thought about implementing the same at our company. Something that iterates through all our tickets, one shots them and creates PRs.

But humans are still left to review the code in the end, and as a developer, code reviewing is one of my least favourite things..

I'm not sure I could spend the rest of my career just reviewing code, and never writing it. And I'm not sure my team would either. They would go insane.

As developers, by nature, we are creative. We like to solve problems. Thats why we do what we do each day. We get a thrill when we solve the problem, test it and it actually works. When we see it in production and users enjoying it. When we see the CPU usage go from 99% to 5%.

I fear we are soon becoming nothing more than the last validation step between AI and reality. And once AI becomes reality, which is very soon, the days of development as we knew it will be over.

One thing I don’t see developers talking about much is that if your job is to only read code instead of writing it, how do you expect to stay good at reviewing code if you never write it?

I only speak for me but when I review code I need to dig into my own experience writing and and remember what works and what doesn’t that I’ve internalized over years of writing and manually debugging code. Take that out of the equation and I wouldn’t be good at reviewing code for long.

I used to write a lot of C++ back in the day, and I can still read it and understand it for the most part but I would never be able to effectively review anything non-trivial. I just don’t have enough recent experience writing it myself to have internalized all of the obscure pitfalls and gotchas. And just vommitting out some C++ from a bot and just having it redo things until it has the appearance of working correctly isn’t gonna help me with that.

“My job now is just reviewing code” is such an extremey short-sighted view I’m terrified for the future where nobody understands anything anymore. I’m sure OpenAI and Anthropic would love this though.

And yeah, reviewing code is one of the more tedious and unfun parts of the job why would I want this?

One of the most annoying parts of my job is my supervisor who used to be a dev but became a manager years ago. He doesn’t really understand the codebase enough anymore and I spend so much explaining basic things to him now it actually hinders our productivity when he wants to “contribute”. And let me just say that getting a Claude sub for the whole team hasn’t helped this at all.

And one last thing - every single engineer I know that needs to maintain a Stripe integration hates them with the power of a million suns.

Totally agree with this. When I review code I don't build a strong mental model of the system, and I think you can only really do that by solving the problems that arise during the creation of the system yourself. I'm optimistic the pendulum will swing away from the "hand off a spec to an agent(s)" and back towards engineers being engaged and directing LLMs to implement/optimize smaller, more specific pieces of code, with most of the direction being determined by the user

>how do you expect to stay good at reviewing code if you never write it?

What exactly does "writing code" mean?

Are you telling me I have to write for loops and if elses forever?

I reckon the developers most excited about AI & agents never got the same thrill or satisfaction that you do. Those developers are plainly motivated by different things, and that’s okay.

Hardly anything substantial about how well this works in practice. It's a hiring ad.

> It's a hiring ad.

And also a project to pad someone's resume.

> The Leverage team builds surprisingly delightful internal products that Stripes can leverage to supercharge their productivity.

The Leverage team kind of sounds like the Department of Government Efficiency

> Over a thousand pull requests merged each week at Stripe are completely minion-produced, and while they’re human-reviewed, they contain no human-written code.

I pity the senior engineer, demoted from a helmsman into a human breakwater, tasked to stand steady against an ever-swelling sea of AI slop.

>> Over a thousand pull requests merged each week at Stripe are completely minion-produced, and while they’re human-reviewed, they contain no human-written code

> I pity the senior engineer, demoted from a helmsman into a human breakwater, tasked to stand steady against an ever-swelling sea of AI slop.

I'm skeptical that the human-in-the-loop, whose only task is to read code, is going to be able to review at the rate that the AI can produce.

It's Undefined Behaviour, now in every language.

The emphasis on one-shot execution is interesting. Most agent frameworks still rely on iterative loops with human checkpoints, but Stripe's approach of giving the agent a complete context dump upfront and letting it run seems closer to how senior engineers actually work - you read the whole PR/spec first, then write the code. The tricky part is always the context window: once your codebase exceeds what fits in context, one-shot falls apart and you're back to chunked reasoning. Curious if they hit that wall and how they handle repo-scale tasks.

My experiments using a more hands off approach of prompting claude code have always resulted in a two-steps-forward-one-step-back play, where the agent clearly did some good stuff, but did some other stuff in a somewhat undesirable manner, which subsequently needed correcting.

This usually results in A: creating commits where tons of code is being constantly added and removed, B: due to Claude's somewhat cavalier attitude to existing code, has steadily eroded my familiarity with the code base.

I'm still not convinced that these longer loops are that beneficial, compared to 1min prompts to 5-10min AI work.

Doesn't delegating make this a lot more possible? You can fire off a request to a sub-agent, they respond with some predictable status that you can parse, and then you continue (you being the "main" agent), so the context window can remain relatively small. Kinda like how a human does it.

Thousands of PRs a week generated by AI and requiring human review sounds like a ton, I wonder what their PR merge rate was before this?

Stripe has somewhere around 3000-3500 engineers on staff, so it's less than one PR a week spread across the org.

Thanks, I realized after I wrote it that the size of their staff was really the variable I was missing. Agreed that's not a remarkably high rate with such a large engineering org.

Who came up with the idea to slowly change the color of selected text? A minion?

The same minion that came up with the cute effect that covers your screen with the word DEVELOPERS, when you scroll to the end of an article?

I didn't read that far. Reminds me of Steve Ballmer:

https://www.youtube.com/watch?v=Vhh_GeBPOhs

Was this video part of the training set?

Minions – Stripe's Coding Agents Part 2

https://news.ycombinator.com/item?id=47086557

Stripe has become a weird company on my opinion. I'm glad Mollie is an option that does not force me into certain technical choices.

There’s something off-putting about making a blog post about some splashy tech that’s is a fork of an open source project, and that tech not also being open source? It reads to me like “Hey, we thought the open source goose project was just okay, so we forked it to do it better. But we’re not going to contribute it back to and instead rename it.”

I think it probably wouldn’t be as weird if the project were a meaningfully different fork of it, but it sounds like it’s trying to accomplish the same goals as the open source project which I feel should probably be ported back? and renaming it seems sorta ungrateful? Kinda like that “you made this? I made this” meme. Maybe I just don’t have an understanding of how different the projects are though…

I don’t know enough about either but if their approach was to make it substantially more opinionated, which is likely in the case of an org that’s subject to audits, it would make sense to keep it separate.

…and you can get almost identical features by simply installing the GitHub app inside Slack, and then asking Copilot to work on something, this should take < 5m to set up for any organisation using Slack and GitHub.

They seem to have just optimized its integration with their existing tooling and workflows. That doesn't sound largely useful to the broader community. It's also probably different enough from goose at this point that rebranding it makes sense. I do think such integrations are hugely important for productivity and usefulness of this sort of tool. It seems like the post is advocating for doing deep 1p integration to further improve the utility of coding agents.

How would they contribute back when their fork is a customization of how it works?

> We’ve customized the orchestration flow in an opinionated way to interleave agent loops and deterministic code

Is goose in such disrepair that you can just drop code changes into it and the smol developer auto-accepts it, happy that anyone is doing the work?

Or is goose actually it's own project with 250 issues and 74 PRs and might have its own ideas about how it's built?

The elephant in that room is that all these LLM's were trained on boatloads of open source software that they can remix enough to not violate any copyrights.

As an open source contributor, in some ways this makes me much more frustrated than someone making a closed source fork of a BSD licensed project.

My take for a very long time has been that any model trained in violation of copyright should not itself be copyrightable. It should be public domain.

This would mean any model for which the trainer did not have permission to create a derivative work either implied by the work’s current license or obtained by them would have to release their model’s weights.

You could argue that it’s fair use, but a fair use quotation of a work does not become the property of the one quoting it. If I quote a line from a song or a novel I do not now own rights to that line. So there’s precedent for this.

Isn't all content generated by generative models already in public domain. Having something in the public domain doesn't force you to release it.

At the very least, that would be fair.

I feel like legal frameworks sometimes lose track of fairness

When everyone is using the same foundation (LLMs) this will become more and more common as a way to sell perceived benefits.

I don't have specific information about Minions, but I do know about Stripe's architecture and internal tooling.

The article isn't really talking about changes they made to goose, it's describing how they went about integrating goose with the rest of their developer infrastructure (ie. the AWS-based remote devbox system, Toolshed, etc).

Welcome to the free software movement!

Copyleft license would not help if they are only using it internally

AGPL would

Not unless they were exposing it over a network to non-Stripe people, AFAIK.

Do you have access to Stripes minion service so you can demand the source code?

No copy left license requires contributing your changes.

[dead]

If they didn't violate the licence agreement then I'm struggling to understand why it's off putting

Just because it’s legal and allowed doesn’t mean it’s not off putting.

Personally, I have no issue with them making their own internal fork, but then blogging about their thing without contributing it back leaves a little bad taste. If it’s so good, then contribute it back, since they benefited from the volunteers.

You can't have it both ways. As a library author choose MIT to encourage commercial usage because companies are afraid of GPL, but then complain that companies are actually using it in a MIT license way without contributing back.

I can find it off putting regardless. Especially since I’m not the person who released it under MIT license.

License it GPL, and it will be fed to a model as training data to recreate it copyright free anyways.

Training falls outside of copyright concerns because of fair use, so proprietary or free is orthogonal. This is how the world is currently trending.

Law, spirit of the law, common decency. Rare currency these days, I know...

[deleted]

You don’t have to agree that it’s off-putting, but if you’re “struggling to understand why” that demonstrates a serious lack of empathy and awareness of social dynamics.

> If they didn't violate the licence agreement then I'm struggling to understand why it's off putting

What? Who cares about the license agreement? Lawyers and bureaucrats maybe. The real issue with _any_ software project is whether it is meant to be a step toward a more livable and peaceful world or not. Sure, some people make guided missile software to murder people for profit, but that's just obviously antisocial behavior, regardless of how well it complies with license agreements.

If you put up a sign on your house saying "businesses, feel free to come use my driveway for whatever you want" and McDonald's sets up a restaurant there then you won't have much sympathy from me.

Well sure, maybe in this case the driveway owner hasn't been slighted, as they consented to the use, but that doesn't mean that suddenly some other person critiquing Mickey D's for factory farming and using prison-slave labor to make uniforms is misguided. You can't just say, "Well, I'm struggling to see why it's off-putting for McDonald's to use that driveway for their slavery-poison-food operation".

I can't think of a less ergonomic way to submit a task than to write a huge Slack message with links and references everywhere.

This really puts the final nail in the coffin that was the legend that Slack developers trigger a minion from their phone during their commute.

It's also funny that they mention they used goose [1] as a starting point. I discovered them at a conference, and quickly realized that nobody was using that crap, to the point that literally every testimony on their website is from their own team.

[1] https://github.com/block/goose

The best camera is the camera you have on you.

Smartphones have terrible camera ergonomics, yet they killed the compact dedicated camera.

The responses to this are wild. I have worked with and built smaller systems like this and it is an incredible speed up.

So much reflexive hate against a genuinely transformative tech. Yes AI has annoying people and grifters, but it is genuinely incredible at some things and finding out how to use it effectively within a company is the most fun I’ve had in my career.

> The Leverage team builds surprisingly delightful internal products that Stripes can leverage to supercharge their productivity.

Why does this sound so insufferable?

None of the adjectives are literal, aside from "internal".

The whole thing is meant to play on your emotions, not convince your mind.

It made me disgusted reading it so I would say yes it did play with my emotions.

Sparkling DevEx

Unfortunately, I don't use Stripe products because they discriminated against me by blocking my account because my project used a Blockchain (which I built myself) as an authentication mechanism.

It's discrimination because Blockchain tech is part of my religious beliefs... Why is it so that less intelligent people who believe that there is a man in the sky watching over them have protection against discrimination but I don't? Yet my beliefs are grounded in science and an actual understanding of our socio-economic system. I deserve more protection, not less!

Does the law require that one's beliefs be irrational in order to benefit from discrimination protections?

Based on the above article about thousands of AI written PRs littering their code base, you might replace “unfortunately” with “fortunately.”

> Blockchain tech is part of my religious beliefs.

What are your religious beliefs? I'm intrigued to hear more.

I believe the monetary system is broken and creates asymmetric monetary playing fields based on distance from monetary injection points (banks and governments). The tax system makes it hard for each unit of currency to travel far from a 'money printer'. After just 6 hops, a dollar is taxed down to about 10 cents; so people who are more than 6 hops from a money printer live in a much more scarce monetary environment than people who are in the front row. It's Cantillon effects on steroids. It means that the entire economy has become a kind of social climbing game to get closer to the money printers. I feel that this game is immoral and people shouldn't be forced to participate. Private currencies should be protected by law.

I essentially believe that the economy is fake. That people get money due to mostly social factors and then make up plausible narratives to explain their success in a way which omits all the critical social elements... And these explanations sound plausible to people in their social circle who are at a similar distance from a money printer so the false beliefs and perceptive distortions are socially validated.

I also believe I'm being persecuted and algorithms are suppressing me for seeing through the scheme and for my ability to explain complex issues simply.

That sounds like a belief, but not necessarily a religious belief.

This is different than a belief system protected by anti discrimination laws

You were building on Tempo, and they deplatformed you?

No. My Blockchain had nothing to do with payments. The Blockchain is for authentication (protection from fake accounts) and tokens represented credits and licensing rights within the platform. The token wasn't even listed on any marketplace.

“Banks are afraid of only two things: regulators and their wives, and they’re more afraid of their wives than the regulators”

I would retroactively make that quote gender neutral but they're really not afraid of their husbands.

Financial institutions feel like blockchains don’t have a clear chronology of KYC/AML, they dont care about KYC/AML they care about violating it for their relationship with the regulator.

I switched to a different, smaller payment provider. It was pretty easy to switch. No problems at all there. I wonder why I even wanted to use Stripe in the first place. You'd think with their size they wouldn't have to fear regulators. These big companies usually have all the regulators in their pocket.

Stripe is right in the middle

Smaller institutions take risks with a niche, and gun for exceptions with the regulators that bigger institutions dont find worthwhile to bother with

And the biggest institutions dgaf because their relationship with the national government will never be broken

[dead]