This has been a sore point in a lot of discussions regarding compiler optimizations and cryptographic code, how compilers and compiler engineers are sabotaging the efforts of cryptographers in making sure there are no side-channels in their code. The issue has never been the compiler, and has always been the language: there was never a way to express the right intention from within C (or most other languages, really).
This primitive we're trying to introduce is meant to make up for this shortcoming without having to introduce additional rules in the standard.
What happened to the blog post? It was moved and now it has disappeared :-(
>how compilers and compiler engineers are sabotaging the efforts of cryptographers
I'm not exposed to this space very often, so maybe you or someone else could give me some context. "Sabotage" is a deliberate effort to ruin/hinder something. Are compiler engineers deliberately hindering the efforts of cryptographers? If yes... is there a reason why? Some long-running feud or something?
Or, through the course of their efforts to make compilers faster/etc, are cryptographers just getting the "short end of the stick" so to speak? Perhaps forgotten about because the number of cryptographers is dwarfed by the number of non-cryptographers? (Or any other explanation that I'm unaware of?)
It's more a viewpoint thing. Any construct cryptographers find that runs in constant time is something that could be optimized to run faster for non-cryptographic code. Constant-time constructs essentially are optimizer bug reports. There is always the danger that by popularizing a technique you are drawing the attention of a compiler contributor who wants to speed up a benchmark of that same construct in non-cryptographic code. So maybe it's not intended as sabotage, but it can sure feel that way when everything you do is explicitly targeted to be changed after you do it.
It’s not intentional. The motivations of CPU designers, compiler writers, and optimizers are at odds with those of cryptographers. The former want to use every trick possible to squeeze out additional performance in the most common cases, while the latter absolutely require indistinguishable performance across all possibilities.
CPUs love to do branch prediction to have computation already performed in the case where it guesses the branch correctly, but cryptographic code needs equal performance no matter the input.
When a programmer asks for some register or memory location to be zeroed, they generally just want to be able to use a zero in some later operation and so it doesn’t really matter that a previous value was really overwritten. When a cryptographer does, they generally are trying to make it impossible to read the previous value. And they want to be able to have some guarantee that it wasn’t implicitly copied somewhere else in the interim.
“Sabotage” can be used in a figurative sense that doesn’t insinuate intent. An adjacent example is “self-sabotage”, which doesn’t imply intent.
Since the sibling comment is dead and thus I can’t reply to it: Search for “unintentional sabotage”, which should illustrate the usage. Despite appearances, it isn’t an oxymoron. See also meaning 3a on https://www.merriam-webster.com/dictionary/sabotage.
[dead]
> making sure there are no side-channels in their code
Any side effect is a side channel. There are always going to be side channels in real code running on real hardware.
Sure you can change your code, compiler, or, or even hardware to account for this but at it's core that is security by obscurity.
So this makes me curious: is there a reason we don't do something like a __builtin_ct_begin()/__builtin_ct_end() set of intrinsics? Where the begin intrinsic begins a constant-time code region, and all code within that region must be constant-time, and that region must be ended with an end() call? I'm not too familiar with compiler intrinsics or how these things work so thought I'd ask. The intrinsic could be scoped such that the compiler can use it's implementation-defined behavior freedom to enforce the begin/end pairs. But Idk, maybe this isn't feasible?
Maybe it might be better to be implemented as a function attribute instead
Or a pragma? Like how OpenMP did it?
I think __builtin_ct_select and __builtin_ct_expr would be good ideas. (They could also be implemented in GCC in future, as well as LLVM.)
In some cases it might be necessary to consider the possibility of invalid memory accesses (and avoid the side-channels when doing so). (The example given in the article works around this issue, but I don't know if there are any situations where this will not help.)
The side channel from memory access timings are exactly why cmov is its own instruction on x86_64. It retrieves the memory regardless of the condition value. Anything else would change the timings based on condition. If you're going to segfault that's going to be visible to an attacker regardless because you're going to hang up.
AFAIU, cmov wasn't originally intended to be a guaranteed constant-time operation, Intel and AMD won't commit to keeping it constant-time in the future, but it just so happened that at one point it was implemented in constant-time across CPUs, cryptographers picked up on this and began using it, and now Intel and AMD tacitly recognize this dependency. See, e.g., https://www.intel.com/content/www/us/en/developer/articles/t...
> The CMOVcc instruction runs in time independent of its arguments in all current x86 architecture processors. This includes variants that load from memory. The load is performed before the condition is tested. Future versions of the architecture may introduce new addressing modes that do not exhibit this property.
I mean the possibility that the rest of the program guarantees that the address is valid if the condition is true but otherwise it might be valid or invalid. This is probably not important for most applications, but I don't know if there are some unusual ones where it would matter.
This has been a sore point in a lot of discussions regarding compiler optimizations and cryptographic code, how compilers and compiler engineers are sabotaging the efforts of cryptographers in making sure there are no side-channels in their code. The issue has never been the compiler, and has always been the language: there was never a way to express the right intention from within C (or most other languages, really).
This primitive we're trying to introduce is meant to make up for this shortcoming without having to introduce additional rules in the standard.
What happened to the blog post? It was moved and now it has disappeared :-(
Archived copy here: https://archive.is/tIXt7
>how compilers and compiler engineers are sabotaging the efforts of cryptographers
I'm not exposed to this space very often, so maybe you or someone else could give me some context. "Sabotage" is a deliberate effort to ruin/hinder something. Are compiler engineers deliberately hindering the efforts of cryptographers? If yes... is there a reason why? Some long-running feud or something?
Or, through the course of their efforts to make compilers faster/etc, are cryptographers just getting the "short end of the stick" so to speak? Perhaps forgotten about because the number of cryptographers is dwarfed by the number of non-cryptographers? (Or any other explanation that I'm unaware of?)
It's more a viewpoint thing. Any construct cryptographers find that runs in constant time is something that could be optimized to run faster for non-cryptographic code. Constant-time constructs essentially are optimizer bug reports. There is always the danger that by popularizing a technique you are drawing the attention of a compiler contributor who wants to speed up a benchmark of that same construct in non-cryptographic code. So maybe it's not intended as sabotage, but it can sure feel that way when everything you do is explicitly targeted to be changed after you do it.
It’s not intentional. The motivations of CPU designers, compiler writers, and optimizers are at odds with those of cryptographers. The former want to use every trick possible to squeeze out additional performance in the most common cases, while the latter absolutely require indistinguishable performance across all possibilities.
CPUs love to do branch prediction to have computation already performed in the case where it guesses the branch correctly, but cryptographic code needs equal performance no matter the input.
When a programmer asks for some register or memory location to be zeroed, they generally just want to be able to use a zero in some later operation and so it doesn’t really matter that a previous value was really overwritten. When a cryptographer does, they generally are trying to make it impossible to read the previous value. And they want to be able to have some guarantee that it wasn’t implicitly copied somewhere else in the interim.
“Sabotage” can be used in a figurative sense that doesn’t insinuate intent. An adjacent example is “self-sabotage”, which doesn’t imply intent.
Since the sibling comment is dead and thus I can’t reply to it: Search for “unintentional sabotage”, which should illustrate the usage. Despite appearances, it isn’t an oxymoron. See also meaning 3a on https://www.merriam-webster.com/dictionary/sabotage.
[dead]
> making sure there are no side-channels in their code
Any side effect is a side channel. There are always going to be side channels in real code running on real hardware.
Sure you can change your code, compiler, or, or even hardware to account for this but at it's core that is security by obscurity.
So this makes me curious: is there a reason we don't do something like a __builtin_ct_begin()/__builtin_ct_end() set of intrinsics? Where the begin intrinsic begins a constant-time code region, and all code within that region must be constant-time, and that region must be ended with an end() call? I'm not too familiar with compiler intrinsics or how these things work so thought I'd ask. The intrinsic could be scoped such that the compiler can use it's implementation-defined behavior freedom to enforce the begin/end pairs. But Idk, maybe this isn't feasible?
Maybe it might be better to be implemented as a function attribute instead
Or a pragma? Like how OpenMP did it?
I think __builtin_ct_select and __builtin_ct_expr would be good ideas. (They could also be implemented in GCC in future, as well as LLVM.)
In some cases it might be necessary to consider the possibility of invalid memory accesses (and avoid the side-channels when doing so). (The example given in the article works around this issue, but I don't know if there are any situations where this will not help.)
The side channel from memory access timings are exactly why cmov is its own instruction on x86_64. It retrieves the memory regardless of the condition value. Anything else would change the timings based on condition. If you're going to segfault that's going to be visible to an attacker regardless because you're going to hang up.
AFAIU, cmov wasn't originally intended to be a guaranteed constant-time operation, Intel and AMD won't commit to keeping it constant-time in the future, but it just so happened that at one point it was implemented in constant-time across CPUs, cryptographers picked up on this and began using it, and now Intel and AMD tacitly recognize this dependency. See, e.g., https://www.intel.com/content/www/us/en/developer/articles/t...
> The CMOVcc instruction runs in time independent of its arguments in all current x86 architecture processors. This includes variants that load from memory. The load is performed before the condition is tested. Future versions of the architecture may introduce new addressing modes that do not exhibit this property.
I mean the possibility that the rest of the program guarantees that the address is valid if the condition is true but otherwise it might be valid or invalid. This is probably not important for most applications, but I don't know if there are some unusual ones where it would matter.