This is fantastic stuff, building these machines out of odd primitives.
They kind of remind me of blind SQL injection attacks. When you can inject SQL, but can't view the output, so you use things like `pg_sleep()` to get a low-fidelity form of query output. Depending on how far you want to go, you can use different sleep lengths to slowly exfiltrate data.
I read this for some work I did a few months ago. It's a very interesting idea to try to uncover a computer within a computer. It reminds me of the Atari 2600 emulators in Minecraft [1], or things like using bitwise operators to compute and write arbitrary data to memory as is done in the Pegasus/NSO exploit [2]. But I think the literature does not necessarily imply that these "weird machines" are Turing complete or capable of much; they are more general. From my understanding weird machines are some sort of FSM of unintended/weird states in a program with various transitions to go from weird state to weird state. The use is to be able to construct a weird machine with enough states and transitions to get a program to a vulnerable state where it can be exploited. Getting something like this with micro-architectural weird machines, the Pegasus exploit, etc. is of course much harder, and more valuable. It will also be interesting to see if the theory behind weird machines becomes used for automated exploit generation in the future.
This doesn't seem to be useful for hiding the fact that something suspicious is going on. If I understand it correctly, a static analysis of the program would reveal that they are decrypting code with an AES key derived from CPU instruction timings, and then executing that code inside a TSX transaction.
Hardly normal behavior for non-malicious code. The only thing hidden is the actual key that will be used when it runs correctly, and hence what exactly the decrypted code does.
(Also, didn't Intel obsolete TSX already in more recent CPU generations, because of its use for speculative execution side-channels?)
When I read these articles, I always ask myself if this is more of a joint OS-ISA issue than just an ISA problem.
Wondering if a well defined OS system, with strict enforcement of memory boundaries at the OS level and at the application level, where the application sits in a well defined deterministic execution model would mitigate some of these unpredictable state transitions.
If one considers a minimalist OS, micro kernel for example, lowering the attack surface, would this not explicitly prevent access to certain microarchitectural states (e.g., by disallowing certain instructions like clflush or speculative paths)? This could be accomplished with a strict memory management jointly at the OS layer and the binary structure of the application… one where the binary has a well defined memory memory boundary. The OS just ensures it is kept with in these limits.
> well defined deterministic execution model would mitigate some of these unpredictable state transitions.
The problem here is that giving a program access to high-resolution (non-virtualized) timers violates deterministic execution. Even without a high-resolution timer, the non-determinism inherent in shared memory parallelism can be exploited to make high-resolution timers. In short, one can use a counter thread to make a very high precision timer.
With high-resolution timers, the timing domain becomes both a potential covert channel and a universal side-channel to spy on other concurrent computations.
Good point, but still, you are leaving the user with too much leverage on the underlying architecture, again from the OS’ perspective.
They way I’m considering this is, one could provide virtual time sources, removing the high resolution timers, where the OS has more of a coarse-grained timer. Not sure the implications, but if needed, one could add jitter or randomness ( Noise ) to the virtual timer values…
This would further prevent thread from running out of sync with the resto of the threads.
Further, one could also add a stack based shared memory model, LIFO would provide a highly predictable behavior from an application perspective. If you make it per process, the shared stack would then be confined to the application. No sure if possible ( haven given deep thought ) but the stacks could be confined to specific cache lines, removing the timing differences caused by cache contention…
My career started in the early 80s. For about 30 years, I felt pretty confident that I understood all the major trends in this business. I have to admit that I didn’t even know exploits like this were possible.
Microarchitectural weird machines: where your CPU's quirks are both the hackers' playground and the defenders' nightmare. Who knew speculative execution could be this creative?
This research definitely views it as a threat to the interests of the computer owners, because attackers can use it to hide malicious behavior better. It's just that it's cool and interesting to explore how and why. It's not like "it's so awesome and desirable to have this behavior here".
I browsed the article but didn't see any reference to increased power use or efficiency. Would such a proposed scheme only be used for cryptography and the remainder of the program run conventionally?
You're right to be curious about the power implications of µWMs! Unfortunately, the article doesn't go into power consumption or efficiency specifically. Probably because the research is still in its early stages, primarily focused on proving the concept and exploring its potential.
As you suggested, a hybrid approach is the most likely scenario for practical applications of µWMs. This means conventional computing for general tasks, I guess. The majority of a program would likely execute using conventional instructions and pathways, minimizing power overhead.
Now you got me curious about using processor bugs and the once popular use of invalid instructions in the Z80 and 6502 days. Do modern OSs guard against exploiting architectural misfeatures?
Modern processors[1] cause an exception on invalid opcodes, instead of performing some undocumented function. They also have control bits to enable/disable features like being able to read certain "system" registers from userspace.
User code generally can't directly violate security (like writing to memory in the kernel or a more privileged process) by just running some instruction, however there are timing side channels that can be used to leak information. The terms to search for are "Spectre" and "Meltdown".
The timestamp counter is one of the registers that an OS can prevent software from reading, but mainstream ones still don't do this AFAIK. Perhaps it would be better to only enable it for processes that have a legitimate reason to need a high-resolution timer.
And of course, x86 has accumulated enough legacy features that you could use to confuse a person reading your code, my user name is one such instruction ;)
[1] pretty much everything newer than the original 8086
They no longer have stable undocumented instructions like the Z80 had (the 6502 lost it on the 65C02) but they still have sizable errata published explaining what legal instructions don't work as expected in which conditions. Also, I remember this tool: https://github.com/Battelle/sandsifter
This is fantastic stuff, building these machines out of odd primitives.
They kind of remind me of blind SQL injection attacks. When you can inject SQL, but can't view the output, so you use things like `pg_sleep()` to get a low-fidelity form of query output. Depending on how far you want to go, you can use different sleep lengths to slowly exfiltrate data.
https://owasp.org/www-community/attacks/Blind_SQL_Injection
I read this for some work I did a few months ago. It's a very interesting idea to try to uncover a computer within a computer. It reminds me of the Atari 2600 emulators in Minecraft [1], or things like using bitwise operators to compute and write arbitrary data to memory as is done in the Pegasus/NSO exploit [2]. But I think the literature does not necessarily imply that these "weird machines" are Turing complete or capable of much; they are more general. From my understanding weird machines are some sort of FSM of unintended/weird states in a program with various transitions to go from weird state to weird state. The use is to be able to construct a weird machine with enough states and transitions to get a program to a vulnerable state where it can be exploited. Getting something like this with micro-architectural weird machines, the Pegasus exploit, etc. is of course much harder, and more valuable. It will also be interesting to see if the theory behind weird machines becomes used for automated exploit generation in the future.
[1] https://www.youtube.com/watch?v=mq7T5_xH24M
[2] https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...
As good a place as any to share: https://matt-rickard.com/accidentally-turing-complete
and the essential: https://gwern.net/turing-complete
This doesn't seem to be useful for hiding the fact that something suspicious is going on. If I understand it correctly, a static analysis of the program would reveal that they are decrypting code with an AES key derived from CPU instruction timings, and then executing that code inside a TSX transaction.
Hardly normal behavior for non-malicious code. The only thing hidden is the actual key that will be used when it runs correctly, and hence what exactly the decrypted code does.
(Also, didn't Intel obsolete TSX already in more recent CPU generations, because of its use for speculative execution side-channels?)
It got obsoleted because of bugs in behaviour, IIRC
When I read these articles, I always ask myself if this is more of a joint OS-ISA issue than just an ISA problem.
Wondering if a well defined OS system, with strict enforcement of memory boundaries at the OS level and at the application level, where the application sits in a well defined deterministic execution model would mitigate some of these unpredictable state transitions.
If one considers a minimalist OS, micro kernel for example, lowering the attack surface, would this not explicitly prevent access to certain microarchitectural states (e.g., by disallowing certain instructions like clflush or speculative paths)? This could be accomplished with a strict memory management jointly at the OS layer and the binary structure of the application… one where the binary has a well defined memory memory boundary. The OS just ensures it is kept with in these limits.
> well defined deterministic execution model would mitigate some of these unpredictable state transitions.
The problem here is that giving a program access to high-resolution (non-virtualized) timers violates deterministic execution. Even without a high-resolution timer, the non-determinism inherent in shared memory parallelism can be exploited to make high-resolution timers. In short, one can use a counter thread to make a very high precision timer.
With high-resolution timers, the timing domain becomes both a potential covert channel and a universal side-channel to spy on other concurrent computations.
Good point, but still, you are leaving the user with too much leverage on the underlying architecture, again from the OS’ perspective.
They way I’m considering this is, one could provide virtual time sources, removing the high resolution timers, where the OS has more of a coarse-grained timer. Not sure the implications, but if needed, one could add jitter or randomness ( Noise ) to the virtual timer values…
This would further prevent thread from running out of sync with the resto of the threads.
Further, one could also add a stack based shared memory model, LIFO would provide a highly predictable behavior from an application perspective. If you make it per process, the shared stack would then be confined to the application. No sure if possible ( haven given deep thought ) but the stacks could be confined to specific cache lines, removing the timing differences caused by cache contention…
My career started in the early 80s. For about 30 years, I felt pretty confident that I understood all the major trends in this business. I have to admit that I didn’t even know exploits like this were possible.
I know, right! This is some serious 200IQ stuff.
Microarchitectural weird machines: where your CPU's quirks are both the hackers' playground and the defenders' nightmare. Who knew speculative execution could be this creative?
In short, timing-attack computed?
It is more complicated, but I find it hilarious people are trying to find utility in metastability problems on modern architectures.
https://en.wikipedia.org/wiki/Metastability_(electronics)
Arguably, the concept is the ultimate reduction of the absurd claim 'It's Not a Bug, It's a Feature.'
Seems some crowds sell the facade of repeatable correctness, rather than acknowledge their design is fundamentally an unreliable Beta. =3
This research definitely views it as a threat to the interests of the computer owners, because attackers can use it to hide malicious behavior better. It's just that it's cool and interesting to explore how and why. It's not like "it's so awesome and desirable to have this behavior here".
I browsed the article but didn't see any reference to increased power use or efficiency. Would such a proposed scheme only be used for cryptography and the remainder of the program run conventionally?
You're right to be curious about the power implications of µWMs! Unfortunately, the article doesn't go into power consumption or efficiency specifically. Probably because the research is still in its early stages, primarily focused on proving the concept and exploring its potential.
As you suggested, a hybrid approach is the most likely scenario for practical applications of µWMs. This means conventional computing for general tasks, I guess. The majority of a program would likely execute using conventional instructions and pathways, minimizing power overhead.
It's an obfuscation technique, not a way to improve efficiency.
Now you got me curious about using processor bugs and the once popular use of invalid instructions in the Z80 and 6502 days. Do modern OSs guard against exploiting architectural misfeatures?
Modern processors[1] cause an exception on invalid opcodes, instead of performing some undocumented function. They also have control bits to enable/disable features like being able to read certain "system" registers from userspace.
User code generally can't directly violate security (like writing to memory in the kernel or a more privileged process) by just running some instruction, however there are timing side channels that can be used to leak information. The terms to search for are "Spectre" and "Meltdown".
The timestamp counter is one of the registers that an OS can prevent software from reading, but mainstream ones still don't do this AFAIK. Perhaps it would be better to only enable it for processes that have a legitimate reason to need a high-resolution timer.
And of course, x86 has accumulated enough legacy features that you could use to confuse a person reading your code, my user name is one such instruction ;)
[1] pretty much everything newer than the original 8086
They no longer have stable undocumented instructions like the Z80 had (the 6502 lost it on the 65C02) but they still have sizable errata published explaining what legal instructions don't work as expected in which conditions. Also, I remember this tool: https://github.com/Battelle/sandsifter
So if i have a dumb pattermatcher on ram it can prefetch by looking at requested instructions and data?