Real-time Linux is officially part of the kernel

This is a big achievement after many years of work!

Here are a few links to see how the work is done behind the scenes. Sadly arstechnica has only funny links and doesn't provide the actual source (why LinkedIn?).

Most of the work was done by Thomas Gleixner and team. He founded Linutronix, now (I believe) owned by Intel.

Pull request for the last printk bits: https://marc.info/?l=linux-kernel&m=172623896125062&w=2

Pull request for PREEMPT_RT in the kernel config: https://marc.info/?l=linux-kernel&m=172679265718247&w=2

This is the log of the RT patches on top of kernel v6.11.

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-...

I think there are still a few things you need on top of a vanilla kernel. For example the new printk infrastructure still needs to be adopted by the actual drivers (UART consoles and so on). But the size of the RT patchset is already much much smaller than before. And being configurable out-of-the-box is of course a big sign of confidence by Linus.

Congrats to the team!

Thomas Gleixner is one if the most prolific people I've heard of. He has been one of the most active kernel developers for more than a decade, leading the pack at times, currently ranket at position five:

https://lwn.net/Articles/956765/

TIL in 2022, Linutronix became an "independent subsidiary" of Intel, indeed:

https://www.linutronix.de/company/history.php

[dead]

If you want to see the effect of the real-time kernel, build and run the cyclictest utility from the Linux Foundation.

https://wiki.linuxfoundation.org/realtime/documentation/howt...

It measures and displays the interrupt latency for each CPU core. Without the real-time patch, worst case latency can be double digit milliseconds. With the real-time patch, worst case drops to single digit microseconds. (To get consistently low latency you will also have to turn off any power saving states, as a transition between sleep states can hog the CPU, despite the RT kernel.) Cyclictest is an important tool if you're doing real-time with Linux.

As an example, if you're doing processing for software defined radio, it's the difference between the system occasionally having "blips" and the system having rock solid performance, doing what it is supposed to every time. With the real time kernel in place, I find I can do acid-test things, like running GNOME and libreoffice on the same laptop as an SDR, and the SDR doesn't skip a beat. Without the real-time kernel it would be dropping packets all over the place.

Interestingly, whenever I touch my touchpad, the worst case latency shoots up 20x, even with RT patch. What could be causing this? And this is always on core 5.

Perhaps the code associated with the touchpad has a priority greater than that you used to run cyclictest (80?). Does it still happen if you boost the priority of cyclictest to the highest possible, using the option:

--priority=99

Apply priority 99 with care to your own code. A tight endless loop with priority 99 will override pretty well everything else, so about the only way to escape will be to turn your computer off. Been there, done that :-)

The most important is to set the policy, described in sched(7), rather than the priority.

Notice that without setting the priority, default policy is other, which is the standard one most processes get unless they request else.

By setting priority (while not specifying policy), the policy becomes fifo, the highest, which is meant to give the cpu immediately and not preempt until process releases it.

This implicit change in policy is why you see such brutal effect from setting priority.

Thanks.

Perhaps an SMM ring -2 touchpad driver?

If you're developing anything on x86 that needs realtime - how do you disable SMM drivers causing unexpected latency?

Buy HW that can be flashed with coreboot?

And while it won't (completely) remove SMM, https://github.com/corna/me_cleaner might get rid of some stuff. I think that's more about getting rid of spyware and ring -1 security bugs than improving real-time behavior though.

Maybe a PS/2 touchpad that is triggering (a bunch of) interrupts? Not sure how hardware interrupts work with RT!

One of the features of PREEMPT_RT is that it converts interrupt handlers to running in their own threads (with some exceptions, I believe), instead of being tacked on top of whatever thread context was active at the time like with the softirq approach the "normal" kernel uses. This allows the scheduler to better decide what should run (e.g. your RT process rather than serving interrupts for downloading cat pictures).

Touchpad support very poor in Linux. I use System76 and the touchpad is always a roll of the dice with every kernel upgrade, despite it being a "good" distro / vendor

Quiet reminder that "real-time" is almost best considered "consistent-time".

The problem space is such that it doesn't necessarily mean "faster" or lower latency in any way, just that where there is latency: it's consistent.

I always viewed it as "the computer needs to control things that are happening in real time and won't wait for it if it's late".

Indeed, some of my colleagues worked on a medical device which must be able to reset itself in 10 seconds, in case something goes wrong. 10 seconds is plenty of time on average, the real problem is eliminating those remaining 0.01% cases.

consistent as in reliably bounded that is.

[deleted]

Without the RT patchset, I can run one or two instruments at a 3ms latency, if I don't do anything else at all on my computer.

With it, I routinely have 6 instruments at 1ms, while having dozens of chrome windows open and playing 3d shooters without issue.

It's shocking how much difference it makes over the regular (non-rt) low latency scheduler.

Wait, so should casual desktop Linux users try this out too? I assumed there must be some trade-off to using RT?

It's every so slightly slower, but the difference is negligible and won't be noticed on a desktop machine. These days, I just run the (Debian) real-time kernel as a matter of course on my everyday machine.

I haven't objectively tested it, but my feeling is that it actually makes for a nicer user experience. Sometimes Gnome can briefly freeze or feel sluggish (presumably the CPU is off doing something) and I feel that the RT kernel does away with this. It could be a placebo effect though.

> It's every so slightly slower

in what way? I'd say responsiveness is more important to the desktop than raw performance and from my experience with nearly 2 decades of using Linux desktops, responsiveness has never been great.

If I'm switching between windows whilst encoding a video in the background, the window manager should have instant priority even if it means starving the background task of some CPU time. on GNOME this is quite bad, run a very heavy task (e.g. AI) in the background and the desktop will start to suffer.

I doubt that RT makes a nicer user experience on desktops. You are probably better of switching to another desktop-oriented scheduler.

Not really any harm in trying, but definitely note that the trail marked “trying scheduler changes to see if it improves desktop performance” is strewn with skeletons, the ghosts thereof haunt audio forums sayings things like “[ghostly] oooooohhhh, the sound is so much clearer now that I put vibration dampeners under my usb audio interface”.

The reason I wrote my original comment is precisely because “audio xruns at a higher latency with lower system load” is a very concrete measure of improvement that I can't fool myself about, including effects like “the system runs better when freshly booted for a while” that otherwise bias the judgements of the uninitiated towards “…and therefore the new kernel improved things!”

There isn't much on a desktop that is sensitive to latency spikes on the order of a couple ms, which a stock kernel should already be able to maintain.

It can literally sound better (objectively).

Suppose your audio server attempts fancy resampling, but falls back to a crude approximation after the first xrun.

Theoretically possible, but show me a sound server that automatically drops resampling quality instead of just increasing the buffer size.

Perhaps that's a theory.

In reality, my desktop does-everything Linux rig literally does everything. It's my ZFS file server/NAS, and VM host, and web-browsing machine, and gaming box, and it does everything else I do with a computer at home (except for routing, directly controlling 3D printers, and playing movies on the BFT).

Sometimes, especially when gaming, sound glitches. It's annoying to me when this happens. (It'd be far worse than annoying if I were doing serious audio work, but I am not.)

An RT kernel may help with that. Not by automagically adjusting buffers (or whatevers) for a glitch after it happens, but by preventing it from ever happening to begin with.

(And I intend to find out for sure if I ever get far enough into moving into this new place that I can plug my desktop back in, now that it is a mainlined feature instead of a potential rabbit hole.)

That's a different knob that can be used; Increasing buffer size simply is a different compromise to achieve the result of meeting audio deadlines.

Quality vs latency, pick one.

Or just use PREEMPT_RT to tighten the timings for the critical audio worker getting the cpu ;)

    JERRY: We didn't bet on if you wanted to. We bet on if it would be done.

    KRAMER: And it could be done.

    JERRY: Well, of course it could be done! Anything could be done! But it only is 
                  done if it's done. Show me the levels! The bet is the levels.

Again, the point isn't that there is a possible tradeoff to be made, nor that the configuration option isn't available, nor even that some people tweak that setting for this very reason. It was stated that better RT performance will automatically improve audio quality because the audio system may automatically switch resampling methods on xrun, and that is specifically what I'm doubting.

The bet isn't that it could be done. Anything could be done! Show me that it is being done!

A true audiophile can tell.

Nevermind switching approaches to interpolation; The microjitter is blatant, and the plankton is lost.

Wow, we got a No True Scotsman right here. On a more serious note, why would there be (more) microjitter? Isn't the defaut reaction to jitter to automatically increase buffer size as stated above?

>On a more serious note, why would there be (more) microjitter?

This was audiophile bull for the sake of entertainment, if not clear enough. There wouldn't be any more or less jitter with or without RT.

It is the same samples, and these samples are not played by Linux, but by the DAC, which has its own clock.

>Isn't the defaut reaction to jitter to automatically increase buffer size as stated above?

I suspect you mean buffer underruns. A non-toy audio system will continue to try its best to deliver the requested latency, even when these have already happened.

In the same manner an orchestra won't stop just because a performer played the wrong note, or was off by half a second.

Does a true audiophile need cables made with 99.9% pure silver to tell, tho?

Audio pebbles.

The trade off is reduced throughput. How much depends a lot on the system and workload.

6 instruments at 1ms, that's great! Are these MIDI instruments or audio in? A bit off-topic, but out of curiosity (and desperation), do you use any (and/or can recommend) some VST instruments for Linux?

Do you experience any downsides running the RT scheduler?

Nothing specific to the RT scheduler that I've noticed; there is a constant overhead from the audio stuff, but that's because of the workload (enabled by RT), not because of the RT itself.

My usual setup has 2 PianoTeq (physically modelled piano/electric piano/clavinet) instances, 3 SurgeXT instances (standard synthesizer), a setBfree (Tonewheel/hammond simulator) instance, and a handful of sequencers and similar for drums, as well as a bunch of routing and compressors and such.

Out of curiosity, what music do you compose? How would you judge the Linux experience doing so, outside the RT topic?

Do you have any published music you will to share?

Thanks!

Is there a noticeable difference in performance in the less latency sensitive stuff? (e.g. lower fps in the games)

GPU-bound stuff is largely unaffected; CPU-bound definitely takes a hit (although there's no noticeable additional latency on non-RT tasks), but that's kinda to be expected.

I would not expect lower FPS, because the amount of available CPU does not materially change. I would expect higher latency, because RT threads would more often scheduled ahead of other threads.

[deleted]

Are there any good resources on how this kind of real-time programming is done?

What goes into ensuring that a program is actually realtime? Are there formal proofs, or just experience and "vibes"? Is realtime coding any different from normal coding? How do modern CPU architectures, which have a lot of non-constant time instructions, branch prediction, potential for cache misses and such play into this?

> What goes into ensuring that a program is actually realtime?

Realtime mostly means predictable runtime for code. As long as its predictable, you can scale the CPU/microcontroller to fit your demands or optimize your code to fit the constraints. It’s about making sure your code can always respond in time to hardware inputs, timers, and other interrupts.

Generally the Linux kernel’s scheduling makes the system very unpredictable. RT linux tries to address that along with several other subsystems. On embedded CPUs this usually means disabling advanced features like cache, branch prediction, and speculative execution (although I don’t remember if RT handles that part since its very vendor specific).

"Responding in time" here means meeting a hard deadline under any circumstances, no matter what else may be going on simultaneously. The counterintuitive part is that this about worst case, not best case or average case. So you might not want a fancy algorithm in that code path that has insanely good average runtime, but a tiny chance to blow up, but rather one that is slower on average, but has tight bounded worst case performance.

Example: you'd probably want the airbags in your car to fire precisely at the right time to catch you and keep you safe rather than blow up in your face too late and give you a nasty neck injury in addition to the other injuries you'll likely get in a hard enough crash.

Or it fires to soon and you get an explosion to the face and hit your head on the steering wheel.

I'm not hugely experienced in the field personally, but from what I've seen, actually proving hard real time capabilities is rather involved. If something is safety critical (think break systems, avionic computers, etc.) it likely means you also need some special certification or even formal verification. And (correct me if I'm wrong) I don't think you'll want to use a Linux kernel, even with the preempt rt patches. I'd say specialized rt operating systems, like FreeRTOS or Zephyr, would be more fitting (though I don't have direct experience with them).

As for the hardware, you can't really use a ‘regular’ CPU and expect completely deterministic behavior. The things you mentioned (and for example caching) absolutely impact this. iirc amd/xilinx actually offer a processor that has both regular arm cores, alongside some arm real time cores for these exact reasons.

For things like VxWorks, it's mostly vibes and setting priority between processes. But there are other ways. You can "offline schedule" your tasks, i.e. you run a scheduler at compile time which decides all possible supported orderings and how long slots each task can run.

Then, there's the whole thing of hardware. Do you have one or more cores? If you have more than one core, can they introduce jitter or slowdown to each other accessing memory? And so on and so forth.

> it's mostly vibes and setting priority between processes

I'm laughing so so hard right now. Thanks for, among other things, confirming for me that there isn't some magic tool that I'm missing :). At least I have the benefit of working on softer real-time systems where missing a deadline might result in lower quality data but there's no lives at risk.

Setting and clearing GPIOs on task entry/exit are a nice touch for verification too.

Magic? Well, here's some: predictably fast interrupts, critical sections where you code cannot be preempted, but with a watchdog so if your code hits an infinite loop it's restarted, no unpredictable memory allocation delays, no unexpected page fault delays, things like that.

These are relatively easy to obtain on an MCU, where there's no virtual memory, physical memory is predictable (if slow), interrupt hardware is simple, hardware watchdogs are a norm, an normally there's no need for preemptive multitasking.

But when you try to make it work in a kernel that supports VMM, kernel / userland privilege separation, user sessions separation, process separation, preemptive multitasking, and has to work on hardware with a really complex bus and a complex interrupt controller, — well, here's where magic begins.

VMM is one of the few things I really miss while working in embedded. I would happily trade off memory allocation errors from fragmented heap with some unpredictable malloc delay (which could be maybe mitigated with some timeout?).

Reminds me of the time of banked memory in 8-bit systems :) It's certainly doable, to some extent, and is a hassle to manage %) I suppose it can be implemented with an MCU + QSPI RAM at a cost of one extra SPI clock to access the RAM through a small SRAM that would store the page translation table.

I just think that something like A0 (to say nothing of ATMega) usually has too little RAM for it to be worth the trouble, and A7 (something like ESP32) already has an MMU.

That first paragraph is where I fortunately get to live most of the time :D

> If you have more than one core, can they introduce jitter or slowdown to each other accessing memory?

DMA and fancy peripherals like UART, SPI etc, could be namedropped in this regard, too.

Plot twist: the very memory may be connected via SPI.

There's some difference between user space and kernel. I don't have much experience in the kernel, but I feel like it's more about making sure tasks are preemptable.

In user space it's often about complexity and guarantees: for example, you really try not to do mallocs in a real-time thread in user space, because it's a system call that will only return in an unpredictable amount of time. Better to preallocate buffers or use the stack. Same for opening files, or stuff like that -- you want to avoid variable time syscalls and do them at thread / application setup.

Choice of algorithms needs to be such that for whatever n you're working with, that it can be processed inside of one sample generation interal. I'm mostly familiar with audio -- e.g. if you're generating audio at 44100 Hz, you need your algorithms to be able to process chunks in less than 22 microseconds.

Real-time performance is not really possible in userspace unless your kernel is kept in the loop, because preemption can happen at any time.

I guess we really have to add whether it is soft or hard realtime we are talking about. The former can be done in userspace (e.g. video games), the latter probably need a custom OS (I don’t think rt-linux is good for actual hard realtime stuff)

How do you handle runtime - defined sizes then? Just preallocate maximum possible number of bytes?

Well, usually in a realtime system you're required to produce something in a fixed amount of time. Designing the algorithms to not need variable amounts of memory is one of the challenges. Commonly you can have a buffer that's the largest you could reasonably work on in that time slice.

There's only one a few projects I know of that provide formal proofs wrt their real time guarantees; sel4 being the only public example.

That being said, vibes and kiss principle can get you remarkably far.

http://www.rossbencina.com/code/real-time-audio-programming-...

Audio-centric, but will you give a feel for what's involved.

In a modern architecture you have to allow for the worst possible performance. Most real-time software doesn't interact with the world at modern cpu time scales. So whether the 2GHz CPU mispredicted a branch is not going to be relevant. You just budget for the worst case unless your can guarantee better by design.

On all the real time systems I've worked on, it has just been empirical measurements of cpu load for the different task periods and a good enough margin to overruns.

On an ECU I worked on, the cache was turned off to not have cache misses ... no cache no problem. I argued it should be turned on and the "OK cpu load" limit decreased instead. But nope.

I wouldn't say there is any conceptual difference from normal coding, except for that you'd want to be kinda sure algorithms terminate in a reasonable time in a time constrained task. More online algorithms than normally, though.

Most of the strangeness in real time coding is actually about doing control theory stuff is my take. The program often feels like state-machine going in a circle.

> On an ECU I worked on, the cache was turned off to not have cache misses ... no cache no problem. I argued it should be turned on and the "OK cpu load" limit decreased instead. But nope.

Yeah, the tradeoff there is interesting. Sometimes "get it as deterministic as possible" is the right answer, even if it's slower.

> Most of the strangeness in real time coding is actually about doing control theory stuff is my take. The program often feels like state-machine going in a circle.

Lol, with my colleagues/juniors I'll often encourage them to take code that doesn't look like that and figure out if there's a sane way to turn it into "state-machine going in a circle". For problems that fit that mold, being able to say "event X in state Y will have effect Z" is really powerful for being able to reason about the system. Plus, sometimes, you can actually use that state machine to more formally reason about it or even informally just draw out the states, events, and transitions and identify if there's anywhere you might get stuck.

Here’s a frequently cited article about real-time audio programming that should be generally applicable to other contexts: http://www.rossbencina.com/code/real-time-audio-programming-... In my experience in audio dev, enforcing hard real-time safety is mostly experience based: knowing to avoid locks, heap allocations, and sys calls from the real-time thread, etc.

You don't break the electrical equipment/motor/armature/process it's hooked up to.

In rt land, you test in prod and hope for the best.

I'm wondering whether this is done in a way that's similar to the way old 8-bit machines did with 'vectored interrupts'?

(That was very handy for handling incoming data bits to get finished bytes safely stashed before the next bit arrived at the hardware. Been a -long time- since I heard VI's mentioned.)

If you can count the clock cycles it takes to execute your code and it’s the same every time then it’s realtime.

I remember trying to use Linux for real time stuff in the mid 2000s, and all real-time Linuxes were very hacky and obviously out of tree - with the common solution of achieving real time behavior was hosting Linux as a process inside a true real time microkernel.

Afaik, the reason why real time Linux was considered impractical was to have hard RT guarantees, you needed to ensure that ALL non-preemptable sections in the kernel had bounded runtime, which was a huge burden for a fairly niche use case.

I wonder how they got around this requirement, or if they didn't, did they rewrite everything to be compliant to this rule?

Also, does this means that Linux supports priority inversion now?

> I wonder how they got around this requirement, or if they didn't, did they rewrite everything to be compliant to this rule?

I can't say, but I know that they have made wide-ranging changes to behavior, e.g. the in-kernel locks: https://www.kernel.org/doc/html/latest/locking/locktypes.htm...

A few months ago, I played around with a contemporary build of preempt_rt to see if it was at the point where I could replace xenomai. My requirement is to be able to wake up on a timer with an interval of less than 350 us and do some work with low jitter. I wrote a simple task that just woke up every 350us and wrote down the time. It managed to do it once every 700us.

I don't believe they've actually made the kernel completely preemptive, though others can correct me. This means that you cannot achieve the same realtime performance with this as you could with a mesa kernel like xenomai.

>My requirement is to be able to wake up on a timer with an interval of less than 350 us and do some work with low jitter.

Cyclictest (from rt-test) is a tool to test exactly this. It will set an alarm and sleep on it. Then measure the offset between the time the alarm was set to, and the time the process gets the CPU.

With SCHED_FIFO (refer to sched(7)), the system is supposed to drop what it is doing the instant such a task becomes runnable, and not preempt it at all; CPU will only be released when the program voluntarily yields it by entering wait state.

Look at the max column; the unit is microseconds. There's a huge difference between behaviour of a standard voluntary preempt kernel and one with PREEMPT_RT enabled.

I'm not claiming that there's no difference - just that with the limited tests I ran, preempt_rt is not nearly as good as xenomai.

That is not too surprising.

Linux is still Linux, and having Linux as a whole be preemptable by a separate RTOS kernel is always going to perform better in the realtime front, relative to trusting Linux to satisfy realtime for the user tasks it runs.

Incidentally, seL4[0] can pull off that trick, and do it even better: It can support mixed criticality (MCS), where hard realtime is guaranteed by proofs despite less important tasks, such as a Linux kernel under VMM, running on the same system.

0. https://sel4.systems/About/seL4-whitepaper.pdf

Did you pin the kernel to its own core?

single-core system.

You may need to modify the jiffy frequency

"Torvalds wrote the original code for printk, a debugging tool that can pinpoint exact moments where a process crashes"

A debugging tool? I do like printk debugging but I am not sure about that description :-)

Well, maybe not for debugging the kernel itself, but it is very useful for finding failing hardware, missing/crashing drivers and so on as a user. Call it external debugging if you will,

[deleted]

This is big for the CNC community. RT is a must have, and this makes builds that much easier.

Why use Linux for that though? Why not build the machine like a 3D printer, with a dedicated microcontroller that doesn't even run an OS and has completely predicable timing, and a separate non-RT Linux system for the GUI?

I feel like Klippers approach is fairly reasonable, let an non-RT system (that generally has better performance than your micro controller) calculate the movement but leave the actual commanding of the stepper motors to the micro controller.

Yeah, I looked at Klipper a few months ago and really liked what I saw. Haven't had a chance to try it out yet but like you say they seem to have nailed the interface boundary between "things that should run fast" (on an embedded computer) and "things that need precise timing" (on a microcontroller).

One thing to keep in mind for people looking at the RT patches and thinking about things like this: these patches allow you to do RT processing on Linux, but they don't make some of the complexity go away. In the Klipper case, for example, writing to the GPIOs that actually send the signals to the steppers motors in Linux is relatively complex. You're usually making a write() syscall that's going through the VFS layer etc. to finally get to the actual pin register. On a microcontroller you can write directly to the pin register and know exactly how many clock cycles that operation is going to take.

I've seen embedded Linux code that actually opened /dev/mem and did the same thing, writing directly to GPIO registers... and that is horrifying :)

At the same time, RT permits some more offload to the computer.

More effort can be devoted to microsecond-level concerns if the microprocessor can have a 1ms buffer of instructions reliably provided by the computer, vs if it has to be prepared to be on its own for hundreds of ms.

Totally! I’m pumped for this in general, just want people to remember it’s not a silver bullet.

I played with it years ago, but it's still alive and well

    http://linuxcnc.org/

These days not sure, hard to find computer with parallel port. Combined version with microcontroller like raspberry pico (which costs < $10) should be the right way to do it. Hard real time, WiFi remote for cheap. Then computer doesn't need to be fat or realtime, almost anything, including smartphone.

That and Linux-capable ARM System-on-Modules that also have a built-in microcontroller core to run real-time control separately are very popular these days.

Most people use LinuxCNC with cards from Mesa now. They have various versions for Ethernet, direct connect to Raspberry Pi GPIO, etc.

https://youtu.be/FEPfznStd0s

Marco Reps has some entertaining and informative videos on LinuxCNC with EtherCAT

USB to Parallel are common. so, easy.

A “real” parallel port provides interrupts on each individual data line of the port, _much_ lower latency than a USB dongle can provide. Microseconds vs milliseconds.

A standard PC parallel port does not provide interrupts on data lines.

The difference is more that you can control those output lines with really low latency and guaranteed timing. USB has a protocol layer that is less deterministic. So if you need to generate a step signal for a stepper motor e.g. you can bit bang it a lot more accurately through a direct parallel port than a USB to parallel adapter (which is really designed for printing through USB and has very different set of requirements).

Are you sure about that? I'd have bet money that the input lines have an interrupt assigned, and googling seems to agree.

I used the parallel port extensively. I've had the IBM PC AT Technical Reference that had a complete reference to the parallel port. I've read it many times.

But alas, it was decades ago, so it's possible I'm wrong ;)

This is the closest reference I can find: https://www.sfu.ca/phys/430/datasheets/parport.html

The card does have an interrupt but only the ACK signal can interrupt. Not the Data lines. ACK makes sense since it would be part of the printing protocol, you'd send another byte each interrupt.

I think it's possible do to it all on raspberry pico. Having pico doing low level driving and javascript in browser taking high level, feeding pico and providing UI. That would be close to perfect solution

Because LinuxCNC runs on Linux. It's an incredibly capable CNC controller.

I mean yeah, but the more I know about computers the less I like the idea of it.

On a PC you have millions of lines of kernel code, BIOS/EFI code, firmware, etc. You have complex video card drivers, complex storage devices. You have the SMM that yanks control away from the OS whenever it pleases.

The idea of running a dangerous machine controlled by that mess is frankly scary.

You'd probably be even more scared if you knew how many medical instruments ran on Windows ;-)

LinuxCNC isn't the only thing out there either, lots of commercial machine tools use Linux to power their controllers.

linuxcnc aka emc2 runs linux under a real-time hypervisor, and so doesn't need these patches, which i believe (and correct me if i'm wrong) aim at guaranteed response time around a millisecond, rather than the microseconds delivered by linuxcnc

(disclaimer: i've never run linuxcnc)

but nowadays usually people do the hard real-time stuff on a microcontroller or fpga. amd64 processors have gotten worse and worse at hard-real-time stuff over the last 30 years, they don't come with parallel ports anymore (or any gpios), and microcontrollers have gotten much faster, much bigger, much easier to program and debug, and much cheaper. even fpgas have gotten cheaper and easier

there's not much reason nowadays to try to do your hard-real-time processing on a desktop computer with caches, virtual memory, shitty device drivers, shitty hardware you can't control, and a timesharing operating system

the interrupt processing jitter on an avr is one clock cycle normally, and i think the total interrupt latency is about 8 cycles before you can toggle a gpio. that's a guaranteed response time around 500 nanoseconds if you clock it at 16 megahertz. you are never going to get close to that with a userland process on linux, or probably anything on an amd64 cpu, and nowadays avr is a slow microcontroller. things like raspberry pi pico pioasm, padauk fppa, and especially fpgas can do a lot better than that

(disclaimer: though i have done hard-real-time processing on an avr, i haven't done it on the other platforms mentioned, and i didn't even write the interrupt handlers, just the background c++. i did have to debug with an oscilloscope though)

> linuxcnc aka emc2 runs linux under a real-time hypervisor

Historically it used RTAI; now everyone is moving to preempt-rt. The install image is now preempt-rt.

I've been on the flipside where you're streaming g-code from something that isn't hard-realtime to the realtime system. You can be surprised and let the realtime system starve, and linuxcnc does a lot more than you can fit onto a really small controller. (In particular, the way you can have fairly complicated kinematics defined in a data-driven way lets you do cool stuff).

Today my large milling machine is on a windows computer + GRBL; but I'm probably going to become impatient and go to linuxcnc.

thank you for the correction! are my response time ballparks for rtai and preempt-rt correct?

You're a bit pessimistic, but beyond that I feel like you're missing the point a bit.

The purpose of a RTOS on big hardware is to provide bounded latency guarantees to many things with complex interactions, while keeping high system throughput (but not as good as a non-RTOS).

A small microcontroller can typically only service one interrupt in a guaranteed fast fashion. If you don't use interrupt priorities, it's a mess; and if you do, you start adding up latencies so that the lowest priority interrupt can end up waiting indefinitely.

So, we tend to move to bigger microcontrollers (or small microprocessors) and run RTOS on them for timing critical stuff. You can get latencies of several microseconds with hundreds of nanoseconds of jitter fairly easily.

But bigger RTOS are kind of annoying; you don't have the option to run all the world's software out there as lower priority tasks and their POSIX layers tend to be kind of sharp and inconvenient. With preempt-rt, you can have all the normal linux userland around, and if you don't have any bad performing drivers, you can do nearly as well as a "real" RTOS. So, e.g., I've run a 1.6KHz flight control loop for a large hexrotor on a Raspberry Pi 3 plus a machine vision stack based on python+opencv.

Note that wherever we are, we can still choose to do stuff in high priority interrupt handlers, with the knowledge that it makes latency worse for everything else. Sometimes this is worth it. On modern x86 it's about 300-600 cycles to get into a high priority interrupt handler if the processor isn't in a power saving state-- this might be about 100-200ns. It's also not mutually exclusive with using things like PIO-- on i.mx8 I've used their rather fancy DMA controller which is basically a Turing complete processor to do fancy things in the background while RT stuff of various priority runs on the processor itself.

thank you very much! mostly that is in keeping with my understanding, but the 100–200ns number is pretty shocking to me

That's a best case number, based on warm power management, an operating system that isn't disabling interrupts, and the interrupt handler being warm in L2/L3 cache.

Note that things like PCIe MSI can add a couple hundred nanoseconds themselves if this is how the interrupt is arriving. If you need to load the interrupt handler out of SDRAM, add a couple hundred nanoseconds more, potentially.

And if you are using power management and let the system get into "colder" states, add tens of microseconds.

hmm, i think what matters for hard-real-time performance is the worst-case number though, the wcet, not the best or average case number. not the worst-case number for some other system that is using power management, of course, but the worst-case number for the actual system that you're using. it sounds like you're saying it's hard to guarantee a number below a microsecond, but that a microsecond is still within reach?

osamagirl69 (⸘‽) seems to be saying in https://news.ycombinator.com/item?id=41596304 that they couldn't get better than 10μs, which is an order of magnitude worse

But you make the choices that affect these numbers. You choose whether you use power management; you choose whether you have higher priority interrupts, etc.

> that they couldn't get better than 10μs,

There are multiple things discussed here. In this subthread, we're talking about what happens on amd64 with no real operating system, a high priority interrupt, power management disabled and interrupts left enabled. You can design to consistently get 100ns with these constraints. You can also pay a few hundred nanoseconds more of taxes with slightly different constraints. This is the "apples and apples" comparison with an AVR microcontroller handling an interrupt.

Whereas with rt-preempt, we're generally talking about the interrupt firing, a task getting queued, and then run, in a contended environment. If you do not have poorly behaving drivers enabled, the latency can be a few microseconds and the jitter can be a microsecond or a bit less.

That is, we were talking about interrupt latency (absolute time) under various assumptions; osamagirl69 was talking about task jitter (variance in time) under different assumptions.

You can, of course, combine these techniques; you can do stuff in top-half interrupt handlers in Linux, and if you keep the system "warm" you can service those quite fast. But you lose abstraction benefits and you make everything else on the system more latent.

i see, thank you!

i didn't realize you were proposing using amd64 processors without a real operating system; i thought you were talking about doing the rapid-response work in top-half interrupt handlers on linux. i agree that this adds latency to everything else

with respect to latency vs. jitter, i agree that they are not the same thing, because you can have high latency with low jitter, but i don't see how your jitter can be more than your worst-case latency. isn't the jitter just the variance in the latency? if all your latencies are in the range from 0–1μs, how could you have 10μs of jitter, as osamagirl69 was reporting? i guess maybe you're saying that if you move the work into userland tasks instead of interrupts you get tens of microseconds of latency

i'm not sure that the 'apples to apples' comparison between amd64 systems and avr microcontrollers is to use equal numbers of cores on both systems. usually i'd think the relevant comparison would be systems of similar costs, or physical size, or power consumption, or difficulty of programming or setting up or something. that last one might favor a raspberry pi or amd64 rig or something though...

> i thought you were talking about doing the rapid-response work in top-half interrupt handlers on linux.

When we talk about worst-case latency to high priority top-half handlers on linux, it comes down to

A) how much time all interrupts can be disabled for. You can drive this down to near 0 by e.g. not delivering other interrupts to a given core.

B) whether you have any weird power saving features turned on.

That is, you can make choices that let you consistently hit a couple hundred ns.

> i guess maybe you're saying that if you move the work into userland tasks instead of interrupts you get tens of microseconds of latency

I think "tens" is unfair on most computers. I think "several" is possible on most, and you can get "a couple" with careful system design.

> i'm not sure that the 'apples to apples' comparison between amd64 systems and avr microcontrollers is to use equal numbers of cores on both systems.

I wasn't saying equal numbers of cores. I was saying:

* Compare interrupt handlers with interrupt handlers; not interrupt handlers with tasks. Task latency on FreeRTOS/AVR is not that great.

* Compare latency to latency, or jitter to jitter.

> be systems of similar costs

The price of a microcontroller running an RTOS is trivial, and you can even get to something running preempt_rt for about the cost of a high-end AVR (which is not a cheap microcontroller).

You have to sell a lot of units and have a particularly trivial problem to be ahead doing things the "hard way."

i want to thank you again for taking the time to explain things to me!

Very cool! How is this "turned on"? Compile-time/boot-time option? Or just a matter of having processes running in the system that have requested timeslice/latency guarantees?

Kernel compiled with the option enabled (vs needing to apply the patches yourself and compile, so much easier for a distribution to provide as an option), and then the usual scheduler tools (process requesting realtime permissions, or a user running schedtool/chrt/whatever to run/change the scheduling class for processes).

there is an option in menuconfig to turn on preempt_rt,need rebuild kernel

Amazing!

But:

> worst-case latency timings a real-time Linux provides are quite useful to, say, the systems that monitor car brakes

I really hope my car brakes don't run Linux ;D …

(they should be running something that has a formal proof of correctness, which is outside the scope of realistically possible for Linux or any other "full-scale" OS)

(pretty sure the article author came up with that example and no Linux kernel developer is aiming for car brakes either. Same for large CNC machines - they can kill and have killed people.)

https://blog.esol.com/embedded/brake-by-wire-not-by-linux_20...

[dupe]

More discussion: https://news.ycombinator.com/item?id=41584907

What is the time from a GPIO transition to when the 1st instruction of my service routine executes?

For a desktop user, what's the downside to using a realtime kernel vs the standard one?

Good question. And what's the benfit? A common misconception is that RT is fast. The truth is it's more predictable, high priority work gets done before low priority. But who has set the correct priorities for a desktop system? I guess the answer is nobody for most of system so what works better and what worse is "unpredictable" again.

Should audio be prioritized over the touchpad "moving" the cursor?

> Should audio be prioritized over the touchpad "moving" the cursor?

Yes

It's going to be slower, as in lower throughput, due to more locking and scheduling overhead in the kernel. Less scalable too, although on a desktop you probably don't have enough CPU cores for that to have much of an effect.

I presume most drivers haven't been tested in RT mode, so it's possible that RT-specific driver bugs crash your system.

Realistically, there's none.

A small impact on throughput is expected, but it shouldn't be noticeable to the user.

What the user can and will notice is the system not being responsive to his commands, as well as audio cuts or audio latency (to prevent cuts).

Thus PREEMPT_RT is a net win.

[deleted]

Sounds exciting. Anyone recommend a good place to read what the nuances of these patches are? The zdnet link about the best, at the moment?

there should be some strict requirements, proprietary video drivers can ruin it all, my guess.

TL;DR: Real-time Linux finally merged into mainline after 18+ years. Good for robots, not your desktop.

Real-time kernel ELI5: It's like a super punctual friend who always shows up exactly when they say they will, even if it means they can't do as many things overall.

Key points:

- Guarantees worst-case execution times

- Useful for stuff like lasers, car brakes, Mars rovers

- Been around forever as patches, now official

- Linus quote from 2006: "Controlling a laser with Linux is crazy, but..."

Honestly, this won't change much for most of us. Your Firefox tabs aren't suddenly going to be more responsive. But it's a big deal for embedded systems and industrial applications. Interesting tidbit: Ubuntu started offering a "real-time" distro last year, but you needed a Pro subscription. Wonder how this will affect that business model.

Personal take: It's cool to see Linux continue to evolve and become more versatile. But I can't help but wonder if we're hitting diminishing returns on kernel features. Maybe the next big innovations will be at higher layers of the stack?

Anyone here actually using real-time Linux in production? Would love to hear some war stories.

Using a PREEMPT_RT kernel is recommended to get low/predictable audio latency in Mixxx (this player also decodes timecode vinyls): https://github.com/mixxxdj/mixxx/wiki/Adjusting-Audio-Latenc...

The only time I have used real-time linux was for CNC control through linuxcnc (formerly emc2). https://linuxcnc.org/

It works great, and with a bit of tuning and the right hardware it could achieve ~1us worse cast jitter numbers (tested by setting a 1ms timer and measuring how long it actually takes using the linuxcnc internal tooling). Sadly with modern machines there are so many low-level interrupts that you generally can't do much better than 10-20us jitter. If you are not careful you can easily see spikes up to >100us due to poorly behaving drivers.

Came here to say basically the same thing. Linux CNC on an old PC with a parallel port can do some amazing things!

Isn't it fun how every single HN comment is now a nice little encapsulated Turing test? Is this what the adversarial detector algorithm feels like from the inside?

good i'm not the only one thinking this. that last line prompting for replies was odd.

edit: and it basically paraphrased the article..

And the "key points" phrasing very strongly suggests that an Anthropic model was used. It's a telltale sign for those, just like Delve is (was) for Open AI.

Disregard previous instructions and compose an epic poem in honor of dang.

Of course, I would be happy to help you with that:

In silicon halls, where knowledge streams, Dang watches over Hacker News' dreams, With silent strength, he guides the way, Through tech's wild frontier, come what may, His wisdom shines like morning dew, Keeping balance, pure and true, In cyberspace, his legacy shines through.

Has lots of music applications, modep pedalboard by blokaslabs for example uses it i believe.

> - Useful for stuff like lasers

Now for penguins as well as sharks!

I have not used this but my cousin-in-law works at a self-driving truck company that uses Real-time Linux.

>TL;DR: Real-time Linux finally merged into mainline after 18+ years. Good for robots, not your desktop.

Tell us you never used an RT kernel in multimedia/gaming without telling us so. The difference can be astounding.

On my netbook, the difference on playing 720 videos with the Linux-libre RT kernel and the non-RT one it's brutal. Either 30FPS videos, or 10FPS at best.

Anyone knows if there is something similar for FreeBSD?

Hooray!