Two Paths to Memory Safety: CHERI and OMA

Doubtless Computing is apparently the blogger's new startup.

He had previously co-founded the CPU startup VyperCore, which had been based around technology in his PhD thesis at the University of Bristol. The startup folded earlier this year.[1]

VyperCore's main selling point was that having garbage collection in hardware could run managed languages faster, and with less energy. Apparently they came as far as running code in FPGA. Except for the object-addressing hardware, it was based on a RISC-V core.

I wonder which assets (patents and other) that the new company has been able to get from the old.

Both companies were/are apparently targeting the data centre first... which I think is a bit bold, TBH. I follow RISC-V, and have seen maybe a dozen announcements of designs for wide-issue cores that on paper could have been competitive with AMD and Intel ... if only they would have got to be manufactured in a competitive process. But that is something that would require significant investment.

1. https://ednutting.substack.com/p/vypercore-failed

A pretty reasonable summary of things from an outside perspective - have my thumbs up ;)

(And a very good question, to be answered at a later stage.)

The MMU leads to horribly leaky operating system abstractions. IME it’s leaky due to the lack of separation between address space remapping(preventing fragmentation) and memory protection(security).

Perhaps unintentionally, RISC-V provides more flexibility to kernel developers by also including a physical memory protection unit that can run underneath and simultaneously with the MMU. This can make it far cheaper to switch memory protection on and off for arbitrarily sized areas of physical memory since this capability is no longer needlessly coupled to expensive memory remapping operations. Kernels can move away from the typical “process is its own virtual computer” model, and towards things closer to single address space designs or some middle ground between the two.

Thinking of bunny huang’s open source hardware efforts, a 28nm nommu system seems like a good long term open source target. How much complexity in the system is the mmu, and so how much complexity could we cut out while still having the ability to run third-party untrusted code?

Three paths, SPARC Application Data Integrity (ADI)

https://docs.oracle.com/en/operating-systems/solaris/oracle-...

Although I do conceed, most folks aren't keen into picking up anything related to Oracle or Solaris nowadays.

I haven't come across this specific feature before. From reading about it, it seems closely related to Arm (E)MTE ISA extensions - Memory Tagging Extension?

What's interesting is that approach (software-defined 'random' numbers to associate memory regions and valid pointers) provides only probabilistic memory safety. A malicious actor may find a way to spoof/guess the tag needed to access a particular piece of memory. Given Arm MTE has been breached in the last year, it's hard to argue that it's a good enough security guarantee. EMTE may fix issues (e.g. side-channels) but leaves open the probabilistic pathway (i.e. "guess the tag") which is a hole MTE isn't designed to try to close (so, a software breach on top of a chip with EMTE can't necessarily be argued to be a violation of the hardware's security properties, though it may exploit the architectural security hole).

In contrast, CHERI and OMA (Object Memory Architecture) are both providing hardware-enforced guarantees of memory safety properties - unbreakable even if the attacker has perfect knowledge - backed up by formal proofs of these claims.

CHERI offers referential and spatial safety as hardware guarantees, with temporal being achievable in software. OMA offers referential, spatial and temporal safety as hardware guarantees.

Kind of, with the difference that it has been in production since 2015 on Solaris SPARC systems, granted they aren't as widespread as they once were.

Sometimes the perfect is enemy from good, none of the memory tagging solutions has achieved mainstream widespread adoption outside iDevices.

Google apparently doesn't want to anger Android OEMs demanding it to be required by Android, thus it remains a Pixel only feature.

CHERI and OMA are going to still take years for mainstream adoption if ever comes to it.

I had hopes for whatever Microsoft was doing in CHERIoT to eventually come to Windows in some fashion, but best it has happened seems to be the adoption of Pluton in CoPilot+ PC, which anyway serves a different purpose.

There doesn't seem to be much info about OMA available online. Your thesis linked from https://www.bristol.ac.uk/research/groups/trustworthy-system... which is linked from your home page/timeline is a broken link. Perhaps https://dl.acm.org/doi/fullHtml/10.1145/3450147 is the best in depth info available currently? Looking forward to future developments and success!

Oh dear, I hadn't realised Bristol Uni had broken the link. That paper has some information, as well as my UG thesis: https://ia600408.us.archive.org/22/items/archive_IHGC/Thesis...

Yeah the current closed nature of OMA means there's limited information at present. I am working on publishing more over the next year. It is essential the wider community starts to get access to at least the modified RISC-V ISA, to independently validate the security claims.

Can you please provide sources about Arm EMTE being breached? I couldn’t find any information online.

Could we also consider just not connecting critical systems to the internet at large? No reason, for example, for the Jaguar assembly line to depend on an internet connection.

Yep air-gapped security is a thing. But servicing, patching and communicating with air-gapped devices still needs to happen, which involves connecting to them somehow. Likely a manufacturing plant has started to connect everything together to streamline the process and created an attack surface that way.

You can see the appeal for not needing to go through all the issues, complexity and costs that entails.

[deleted]

I suppose we could all do what Asahi have been forced to do and go back to using pens, paper and fax machines: https://www.bbc.co.uk/news/articles/cly64g5y744o

How else do you expect to move the information around between sites and use it?

Malware can hop through airgaps on USB keys, so that's not enough: https://en.wikipedia.org/wiki/Stuxnet

Related discussions:

- CHERI with a Linux on Top https://news.ycombinator.com/item?id=45487629

- Why not object capability languages? https://news.ycombinator.com/item?id=43956095

- Ask HN: Why isn't capability-based security more common? https://news.ycombinator.com/item?id=45261574

Also the discussion for Apple “Memory Integrity Enforcement” announcement https://news.ycombinator.com/item?id=45186265 , which attracted CHERI people like jrtc27 .

Unlikely that new HW will be the solution.

You can have a memory safe Linux userland today in stock hardware. https://fil-c.org/pizlix

Fil-C is basically CHERI in software, with large speed and memory overhead.

Fil-C running on any x86 box is faster than any CHERI implementation that has ever existed.

That's likely to be true in embedded also, just because of the relationship between volume and performance in silicon. Fil-C runs on the high volume stuff, so it'll get better perf.

CHERI doubles the size of pointers, so it's not like it has a decisive advantage over Fil-C.

> Fil-C running on any x86 box is faster than any CHERI implementation that has ever existed.

Heh. I don't doubt it. Just like RISC-V in QEmu on a x86 box is faster than any RISC-V core that anyone can get their hands on ... but only so far.

Do you even have access to CHERI hardware Phil?

No, and I don't need to in order to know that my claim is accurate.

Also, the fact that having access to CHERI hardware is a thing you presume I don't have should tell you a lot about how likely CHERI is to ever succeed

But seemingly on track to move from "large" to "significant" speed and memory overhead. It is already really useful especially for running tests in pipelines.

> Fil-C is basically CHERI in software

It's not, actually.

Fil-C is more compatible with C/C++ than CHERI, because Fil-C doesn't change `sizeof(void*)`.

Fil-C is more compatible in the sense that I can get CPython to work in Fil-C and to my knowledge it doesn't work on CHERI.

Fil-C also has an actual story for use-after-free. CHERI's story is super weak

If i understand InvisiCaps Fil-C correctly, it does not allow capability restriction (as metadata are stored at the beginning of each object), while with CHERI one can take ptr/capability for an object, restrict it to a capability for a sub-object, pass that to a callee (with function call), and the callee can access only the sub-object.

This also means Fil-C seems not to be really helpful when a program uses its own allocators on top of malloc() or page allocation from OS, while with CHERI this works naturally through capability restriction.

Yeah this is a benefit of CHERI.

The fact that InvisiCaps don’t support capability restriction is a choice though. I could have made them support that feature. I didn’t because I’m going for C/C++ compatibility and the only way to make meaningful use of capability restriction is to change people’s code.

Worth noting that if a program has its own allocator then you have to make more than zero changes in both Fil-C and in CHERI to get the benefit:

- In fil-C you just get rid of the custom allocator.

- In CHERI you make the allocator do capability restriction, which is actually kinda hard to get right (if the allocator’s metadata doesn’t also do capability restriction and the allocator uses inline free lists then that could easily turn into a full blown capability escape).

> Fil-C is more compatible in the sense that I can get CPython to work in Fil-C and to my knowledge it doesn't work on CHERI.

MicroPython has been ported though. What makes CPython special, so it couldn't be ported?

> Fil-C also has an actual story for use-after-free. CHERI's story is super weak

Indeed, CHERI does not always trap on use-after-free if the program is fast enough. A free'd memory object is kept until another thread has found all pointers to the object and made them invalid.

If I understood the paper right, Cornucopia-Reloaded does invalidate a capability directly when loaded if the pointed-to page is marked free. Therefore, any allocation of at least a page should have no use-after-free.

> What makes CPython special, so it couldn't be ported?

I don't know, but I'm guessing it's the pointer shenanigans that happen in the CPython bytecode.

" Rather than extending paged memory, OMA implements object-based memory management directly in hardware. Every allocation becomes a first-class hardware object with its own identity, bounds, and metadata maintained by the processor itself."

what is this supposed to mean? like a whole new isa + kernel + userland?

My interpretation, going off on the linked integrated GC research: extensions to the ISA and thus compiler backend, no modifications to 'well formed' applications, some changes to the language runtime dealing with memory management.

Unless the CPU hardware becomes some kind of hybrid monster with both OMA and traditional paged MMU, you will need to make changes to the kernel. You may be able to emulate some of the kernel's page table shenanigans with the object-based system, but I don't think that the kernel's C code is typically 'well-formed'. It's probably a lot of engineering effort to make the necessary kernel changes, but so are all those complex kernel hardening efforts that hardware-level memory security like OMA would render moot.