RAPL stands for Running Average Power Limit. It is a power estimation feature in modern x86 CPUs from Intel and AMD.
In the green software community it is extensively used in order to get accurate energy measurements for the CPU and DRAM components.
Many papers have actually been looking at if the values of RAPL are accurate or not as well in the domain of energy but also in the domain of time:
RAPL however has been nerfed recently through a discovered power-side-channel attack coined Platypus Moritz Lipp showed that it is possible to read the current executed processor instructions and also the memory layouts especially for data stored in the believed to be secure memory enclave from Intel called SGX SGX (Software Guard Extension) is a feature which allows the processor to create an enclave in the memory that cannot be accessed even by code running in lower access rings than the current code.
Intel has reacted directly with a microcode update that results in distorting the RAPL signal when SGX is enabled in the system. Alternative to that the user may also set a register in the processor to activate the so called energy filtering even when SGX is disabled.
Intel says that the actual RAPL data might be skewed by up to 50% of the original value.
What we have not found so far is any script or data that reproduces this behaviour and shows the distortion in action.
We first tried all the machines that we had lying around that according to Intel hat SGX on the chip or activateable through Intel ME:
Our Surface Book 1 and 2 have the capability according to Intel, but Microsoft custom BIOS cannot enable it:https://www.reddit.com/r/Surface/comments/7z1kmz/intel_sgx_extensions_arent_enabled_in_uefi_cant/
On our Fujitsu BIOS there was also now way to enable it, even after an update. Also Intel ME showed no option.
We also tried multiple help articles from Intel, but nothing worked:
Intel says, that if the setting is not available in the BIOS then there is no help: https://community.intel.com/t5/Intel-Software-Guard-Extensions/Enabling-SGX-if-BIOS-doesn-t-provide-Settings/td-p/1256808
We then resorted to going to the cloud, as there are way more CPUs availabe to us then we have at home. Sadly (at least for our case :) ) SGX is usually always disabled in cloud environments. AWS even rolls it’s own enclave called Nitro Enclaves.
However, if you rent a bare metal EC2 machine (either the .metal or the largest option) you get access to the CPU registers and can set the energy filtering flag.
It is suprisingly easy to turn energy filtering on. We have put all the scripts in our Tools repository.
Here you find the simple command line code (
sudo rdmsr -d 0xbc) to check the register 0xbc if energy filtering is active.
A 0 returned means it is off. A 1 returned means it is on.
By issueing a wrmsr command like that:
sudo wrmsr 0xbc 1 energy filtering can be turned on.
The results show our runs by checking CPU RAPL energy consumption of Package 0 and Package 1 (we have a two chip machine) on idle for 5 Minutes with our low overhead MSR RAPL checking reporter.
We tested the system only on idle and as we can cleary see in the graphs the signal is (apart from three small outlier spikes) very close to the mean.
The distorted signal with energy filtering turned on is not only extremely noise and has a very high variance, it also has suprisingly a higher mean than the non-filtered signale.
We would have expected that at least the mean over a longer period of time would stay the same … but maybe 5 Minutes are not enough to get a solid average.
The results are so strong in effect that even this setup, which allows only for qualitative conclusion is already sufficient to say that an active energy filtering results in an unusable signal. Maybe even the average over a long time might not be of any use … but this needs further investigation.
The take away for us is to incorporate guard clauses in all our tools to check for this feature and abort any measurements with an error if active.