While I have had my time fighting the OOM killer, I believe overcommit would have always won. To torture the metaphor a bit more, airlines have OOF mechanism - they just eject the overcommitted passengers before the plane takes off.
A passenger buying a ticket is malloc(), but passengers don't always utilize the seat (use the memory). Normally this works out fine, but occasionally, there are too many passengers. Thankfully though instead of executing a couple passengers they give you a voucher.
I know this is not a popular / mainstream position, but I managed a very large fleet of systems this way:
- no system swap
- enough memory for core system services set aside in a cgroup for them to use
- by default, all prod service binaries load all code pages into ram at start, and lock them in (no paging out code pages at runtime)
- if needed (rare) services can mount some swap in their own cgroup, but very much discouraged
You need to know how much ram you are going to use, and actually stick to that. Very little is wasted in practice, and you don't have to deal with OOMs all the time. Everything is much more predictable.
I agree with your perspective. I certainly agree that swap can be invaluable at times, and is generally a mistake for your run-of-the-mill production services.
It's a nice approach particularly because all OOMs become actionable: there's a bug in a service or a limit is wrong or traffic is changing in an unexpected way.
Systems built this way end up being extremely reliable in my experience.
It's an uphill battle both ways though and not everyone is up for that experience.
I still remember following Andries’s “Linux kernel hacker’s hut” course he taught at the Eindhoven University of Technology (TU/e) back in 2010. Every week we’d get an assignment where we had to write exploits for commonly occurring security vulnerabilities (e.g., buffer overflows, bad printf format). It was one of the most enjoyable courses I ever followed. Thanks for that, Andries!
Hey fellow TU/e'er :) I followed his course as well, somewhere around 2004/5. Executing man in the middle attacks, writing buffer overflow exploits. Good memories!
It's 2026 and I still can't configure the OOM killer to kill firefox before anything else.
So, in actuality, I think your assertion just taught us all something, because despite knowing that the OOM killer and that the Magic SysRq key[1] exists, I didn't know you could configure this as an input!
I'm aware of it, but it's awkward to use in practice. You have to track down all the FF processes, each time you run it, and adjust all their scores.
You could launch it as a systemd user target with OOMScoreAdjust=500 in the service section; weird and unconventional but wrapped in .desktop file it doesn't appear to be unwieldy.
Ah. Yes, that is awkward. Well, nonetheless, you taught me a new feature. Thanks!
sounds like a job for a program
Maybe firefox could self-adjust, as a policy?
I always wanted it to target java processes, as they were always the culprit. These days it's python, VSCode, and antigravity.
This. It's always browser running amok. I configured win+k shortcut key to: killall -9 chrome
Maybe not in kernel, but running the earlyoom daemon will let you do exactly that in userspace.
It's not a panacea, but in my case setting browser.tabs.unloadOnLowMemory in about:config helped a bunch.
[deleted]
I never pay for the OOF insurance, it seems like a waste of money and I've never met anyone that's had it happen.
It can only happen once anyway, and I fly weekly!
Happy to see this trending, I probably share this in my company's slack once a month.
I confess, this is very funny and the underlying situation is a bit absurd, but it's unclear what point Brouwer is making by pointing out the absurdity.
There surely is something absurd about having to register specific processes as exempt from the OOM killer. But given that the OOM killer exists, and could kill xlock...how should that be fixed?
I think part of it is that the design of screen lockers on X11 is just broken. If the locker crashes (or is killed), then the screen unlocks. Security-wise, it fails open. On Windows and macOS (and Wayland, using the ext-screen-lock protocol, coupled with sane compositor policy), that can't happen.
The right way for this to work is for the X server to have an extension that lets a screen locker say "hey, I'm locking the screen now", and the X server should respond to that by pretending that the screen locker client is the only client that exists: no other client gets input or gets to draw. And if the screen locker crashes (or is killed), the X server should just put itself into a permanently-locked state where it will never again send any input to anything, and won't ever draw anything except a blank screen. That's not a desirable situation, of course, but it's better than unlocking the screen.
I read him as arguing that overcommit was a mistake. Of course, he doesn't answer any of the obvious follow-up questions, such as, does fork–exec copy all the process's memory and then immediately throw it away, or what. (One could argue that fork–exec was also a mistake, but it long predates Linux, so this doesn't answer the question of how Torvalds should have designed it.)
> does fork–exec copy all the process's memory
NT: Yes? Why not?
(note that this refers to the Windows NT kernel's operation because it had historically a POSIX emulation layer (NT Personalities), not the modern WSL which is just Linux in a Hyper-V)
The point is that the OOM killer shouldn't exist and arguing about how to tweak it is addressing the wrong problem
But the second clause doesn't follow from the first!
I don't think Linux was plausibly going to remove the OOM killer in 2004 or later. So the right solution for Linux is very much to tweak it to be less painful.
I agree that that's the point he's making, but I don't see how that would work practically. His attitude is that malloc(1<<63) should immediately crash the system, every time? How is that better?
No, if a process allocates an infeasible amount, malloc fails and the process needs to deal with the failure (which is what already happens, "malloc doesn't fail on Linux" is only really true for smaller-than-page-size allocations). The point being made is that the system should account conservatively for all memory that can be used, not just the optimistic underestimate that overcommit enables (i.e. the plane should always carry enough fuel for contingencies, and landing with extra fuel is a good outcome).
FreeBSD has a "protect" command which does something similar to what this asks for – the man page [1] describes it:
"The protect command is used to mark processes as protected. The kernel does not kill protected processes when swap space is exhausted. [...] If you protect a runaway process that allocates all memory the system will deadlock."
I’d say, let the one who tried to allocate memory crash, and if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.
Statically allocated memory can still OOM on access, due to overcommit and lazy page table population. What you really want is mlockall(2) (probably with MCL_CURRENT|MCL_ONFAULT followed by madvise with MADV_POPULATE_*)
oops MCL_ONFAULT kinda does the opposite of what I wanted - I think if you omit that you can skip the madvise, and mlockall will populate everything for you.
> if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.
This doesn't save you if someone other allocates and OOM killer chooses you as victim
What is proposed is to not have an OOM killer with a selection process, meaning that the "someone other allocates" would be the one dying.
The problem is that Linux has memory overcommit and it will OOM when a process faults a page in, not just when someone allocates memory.
So the OOM condition can hit any random process, not necessarily one that just tried to allocate. If you don't have some sort of selection, then you would still have an OOM killer, only it will be killing completely at random.
That's true, but critical processes could mlockall() after setup, so their stuff never needs paging in.
Yes, don’t have OOM roulette.
This is only a viable answer when overcommit is disabled. The problem comes when overcommit is enabled and you find yourself in a position where many programs think they already have memory and yet there is none to give them. If you simply kill the first piece of code that encounters the end of available memory you might take down anything including the kernel itself.
Nothing like statically allocating memory can work when overcommit is enabled because the kernel is free to compress memory, page it out and etc. and then murder you the next time you try to perform any operation that it doesn't have the space for, no matter how safe and static your initialization was.
Note that overcommit is very useful in many cases including the ones where swap saves the stability of the system under conditions that would otherwise completely lock up or panic, so it's also not viable to just prevent it from being used.
[deleted]
OOM killer always felt like a band-aid on a severed artery to me. I've rarely seen a machine that got into OOM state really recover without a full reboot.
Why would a system break if you SIGKILL a process?
I’ve seen plenty of server log with OOM killing mariadb processes, and then being restarted automatically by systemd, often with no one noticing if not days later.
The thing that bogs down systems and often makes them unrecoverable is when a memory hungry process starts swapping. Good luck trying to SSH in. Swap is such a silly idea on servers - good to deal with pages no one accesses, catastrophic when you’re out of RAM and memory latencies suddenly become 4 or 5 orders of magnitude slower.
I’m not against taking down the kernel if the situation is that catastrophic. Better than killing the lock screen for sure.
Shouldn't desktop environments detect if a lock screen terminated abnormaly anyway? The OOM killer is just one of many possible causes.
IMO if the security of a system depends on the lock screen not crashing then the system is not very secure. Security protocols should never fail open like that; a lock screen should never simply be a layer on top of the authenticated desktop. Windows and macOS get this right. I believe Wayland display managers are also able to get this right (but I haven't checked).
Yes, Wayland should fix this. Granted, then you have a locked screen that the user may or may not be able to unlock, which is awkward if better.
The fact that xlock crashing unlocks an X11 session is, IMO, pathetic.
looking forward to your other insights
(2004)
Thanks. I was confused for a bit, given these days you can do
There's also /proc/sys/vm/panic_on_oom and /proc/sys/vm/oom_kill_allocating_task for other behaviours suggested in the comments.
Especially in an era where RAM is so expensive, the obvious answer is to simply never use memory. If your data can't fit in the plethora of CPU registers at your disposal, your software is probably too complicated. /s
While I have had my time fighting the OOM killer, I believe overcommit would have always won. To torture the metaphor a bit more, airlines have OOF mechanism - they just eject the overcommitted passengers before the plane takes off.
A passenger buying a ticket is malloc(), but passengers don't always utilize the seat (use the memory). Normally this works out fine, but occasionally, there are too many passengers. Thankfully though instead of executing a couple passengers they give you a voucher.
I know this is not a popular / mainstream position, but I managed a very large fleet of systems this way:
- no system swap
- enough memory for core system services set aside in a cgroup for them to use
- by default, all prod service binaries load all code pages into ram at start, and lock them in (no paging out code pages at runtime)
- if needed (rare) services can mount some swap in their own cgroup, but very much discouraged
You need to know how much ram you are going to use, and actually stick to that. Very little is wasted in practice, and you don't have to deal with OOMs all the time. Everything is much more predictable.
I agree with your perspective. I certainly agree that swap can be invaluable at times, and is generally a mistake for your run-of-the-mill production services.
It's a nice approach particularly because all OOMs become actionable: there's a bug in a service or a limit is wrong or traffic is changing in an unexpected way.
Systems built this way end up being extremely reliable in my experience.
It's an uphill battle both ways though and not everyone is up for that experience.
I still remember following Andries’s “Linux kernel hacker’s hut” course he taught at the Eindhoven University of Technology (TU/e) back in 2010. Every week we’d get an assignment where we had to write exploits for commonly occurring security vulnerabilities (e.g., buffer overflows, bad printf format). It was one of the most enjoyable courses I ever followed. Thanks for that, Andries!
Hey fellow TU/e'er :) I followed his course as well, somewhere around 2004/5. Executing man in the middle attacks, writing buffer overflow exploits. Good memories!
It's 2026 and I still can't configure the OOM killer to kill firefox before anything else.
I looked into this, and actually, it seems like maybe you can? https://man7.org/linux/man-pages/man5/proc_pid_oom_score_adj...
So, in actuality, I think your assertion just taught us all something, because despite knowing that the OOM killer and that the Magic SysRq key[1] exists, I didn't know you could configure this as an input!
[1]: https://en.wikipedia.org/wiki/Magic_SysRq_key
I'm aware of it, but it's awkward to use in practice. You have to track down all the FF processes, each time you run it, and adjust all their scores.
You could launch it as a systemd user target with OOMScoreAdjust=500 in the service section; weird and unconventional but wrapped in .desktop file it doesn't appear to be unwieldy.
Ah. Yes, that is awkward. Well, nonetheless, you taught me a new feature. Thanks!
sounds like a job for a program
Maybe firefox could self-adjust, as a policy?
I always wanted it to target java processes, as they were always the culprit. These days it's python, VSCode, and antigravity.
This. It's always browser running amok. I configured win+k shortcut key to: killall -9 chrome
Maybe not in kernel, but running the earlyoom daemon will let you do exactly that in userspace.
It's not a panacea, but in my case setting browser.tabs.unloadOnLowMemory in about:config helped a bunch.
I never pay for the OOF insurance, it seems like a waste of money and I've never met anyone that's had it happen.
It can only happen once anyway, and I fly weekly!
Happy to see this trending, I probably share this in my company's slack once a month.
I confess, this is very funny and the underlying situation is a bit absurd, but it's unclear what point Brouwer is making by pointing out the absurdity.
There surely is something absurd about having to register specific processes as exempt from the OOM killer. But given that the OOM killer exists, and could kill xlock...how should that be fixed?
I think part of it is that the design of screen lockers on X11 is just broken. If the locker crashes (or is killed), then the screen unlocks. Security-wise, it fails open. On Windows and macOS (and Wayland, using the ext-screen-lock protocol, coupled with sane compositor policy), that can't happen.
The right way for this to work is for the X server to have an extension that lets a screen locker say "hey, I'm locking the screen now", and the X server should respond to that by pretending that the screen locker client is the only client that exists: no other client gets input or gets to draw. And if the screen locker crashes (or is killed), the X server should just put itself into a permanently-locked state where it will never again send any input to anything, and won't ever draw anything except a blank screen. That's not a desirable situation, of course, but it's better than unlocking the screen.
I read him as arguing that overcommit was a mistake. Of course, he doesn't answer any of the obvious follow-up questions, such as, does fork–exec copy all the process's memory and then immediately throw it away, or what. (One could argue that fork–exec was also a mistake, but it long predates Linux, so this doesn't answer the question of how Torvalds should have designed it.)
> does fork–exec copy all the process's memory
NT: Yes? Why not?
(note that this refers to the Windows NT kernel's operation because it had historically a POSIX emulation layer (NT Personalities), not the modern WSL which is just Linux in a Hyper-V)
The point is that the OOM killer shouldn't exist and arguing about how to tweak it is addressing the wrong problem
But the second clause doesn't follow from the first!
I don't think Linux was plausibly going to remove the OOM killer in 2004 or later. So the right solution for Linux is very much to tweak it to be less painful.
I agree that that's the point he's making, but I don't see how that would work practically. His attitude is that malloc(1<<63) should immediately crash the system, every time? How is that better?
No, if a process allocates an infeasible amount, malloc fails and the process needs to deal with the failure (which is what already happens, "malloc doesn't fail on Linux" is only really true for smaller-than-page-size allocations). The point being made is that the system should account conservatively for all memory that can be used, not just the optimistic underestimate that overcommit enables (i.e. the plane should always carry enough fuel for contingencies, and landing with extra fuel is a good outcome).
FreeBSD has a "protect" command which does something similar to what this asks for – the man page [1] describes it:
"The protect command is used to mark processes as protected. The kernel does not kill protected processes when swap space is exhausted. [...] If you protect a runaway process that allocates all memory the system will deadlock."
[1] https://man.freebsd.org/cgi/man.cgi?query=protect&apropos=0&...
I’d say, let the one who tried to allocate memory crash, and if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.
Statically allocated memory can still OOM on access, due to overcommit and lazy page table population. What you really want is mlockall(2) (probably with MCL_CURRENT|MCL_ONFAULT followed by madvise with MADV_POPULATE_*)
oops MCL_ONFAULT kinda does the opposite of what I wanted - I think if you omit that you can skip the madvise, and mlockall will populate everything for you.
> if you’re a critical process like xlock, use statically allocated memory and don’t alloc again.
This doesn't save you if someone other allocates and OOM killer chooses you as victim
What is proposed is to not have an OOM killer with a selection process, meaning that the "someone other allocates" would be the one dying.
The problem is that Linux has memory overcommit and it will OOM when a process faults a page in, not just when someone allocates memory.
So the OOM condition can hit any random process, not necessarily one that just tried to allocate. If you don't have some sort of selection, then you would still have an OOM killer, only it will be killing completely at random.
That's true, but critical processes could mlockall() after setup, so their stuff never needs paging in.
Yes, don’t have OOM roulette.
This is only a viable answer when overcommit is disabled. The problem comes when overcommit is enabled and you find yourself in a position where many programs think they already have memory and yet there is none to give them. If you simply kill the first piece of code that encounters the end of available memory you might take down anything including the kernel itself.
Nothing like statically allocating memory can work when overcommit is enabled because the kernel is free to compress memory, page it out and etc. and then murder you the next time you try to perform any operation that it doesn't have the space for, no matter how safe and static your initialization was.
Note that overcommit is very useful in many cases including the ones where swap saves the stability of the system under conditions that would otherwise completely lock up or panic, so it's also not viable to just prevent it from being used.
OOM killer always felt like a band-aid on a severed artery to me. I've rarely seen a machine that got into OOM state really recover without a full reboot.
Why would a system break if you SIGKILL a process?
I’ve seen plenty of server log with OOM killing mariadb processes, and then being restarted automatically by systemd, often with no one noticing if not days later.
The thing that bogs down systems and often makes them unrecoverable is when a memory hungry process starts swapping. Good luck trying to SSH in. Swap is such a silly idea on servers - good to deal with pages no one accesses, catastrophic when you’re out of RAM and memory latencies suddenly become 4 or 5 orders of magnitude slower.
I’m not against taking down the kernel if the situation is that catastrophic. Better than killing the lock screen for sure.
Shouldn't desktop environments detect if a lock screen terminated abnormaly anyway? The OOM killer is just one of many possible causes.
IMO if the security of a system depends on the lock screen not crashing then the system is not very secure. Security protocols should never fail open like that; a lock screen should never simply be a layer on top of the authenticated desktop. Windows and macOS get this right. I believe Wayland display managers are also able to get this right (but I haven't checked).
Yes, Wayland should fix this. Granted, then you have a locked screen that the user may or may not be able to unlock, which is awkward if better.
The fact that xlock crashing unlocks an X11 session is, IMO, pathetic.
looking forward to your other insights
(2004)
Thanks. I was confused for a bit, given these days you can do
to disable OOM killing for a process.https://github.com/torvalds/linux/blob/master/include/uapi/l...
There's also /proc/sys/vm/panic_on_oom and /proc/sys/vm/oom_kill_allocating_task for other behaviours suggested in the comments.
Especially in an era where RAM is so expensive, the obvious answer is to simply never use memory. If your data can't fit in the plethora of CPU registers at your disposal, your software is probably too complicated. /s
I see you are an AMD VCACHE enjoyer.