RAPL, SGX and energy filtering - Influences on power consumption
- by Arne Tarara

RAPL stands for Running Average Power Limit. It is a power estimation feature in modern x86 CPUs from Intel and AMD.

In the green software community it is extensively used in order to get accurate energy measurements for the CPU and DRAM components.

Many papers have actually been looking at if the values of RAPL are accurate or not as well in the domain of energy but also in the domain of time:

RAPL however has been nerfed recently through a discovered power-side-channel attack coined Platypus Moritz Lipp showed that it is possible to read the current executed processor instructions and also the memory layouts especially for data stored in the believed to be secure memory enclave from Intel called SGX SGX (Software Guard Extension) is a feature which allows the processor to create an enclave in the memory that cannot be accessed even by code running in lower access rings than the current code.

Intel has reacted directly with a microcode update that results in distorting the RAPL signal when SGX is enabled in the system. Alternative to that the user may also set a register in the processor to activate the so called energy filtering even when SGX is disabled.

Intel says that the actual RAPL data might be skewed by up to 50% of the original value.

Intel energy filtering - Source Intel

What we have not found so far is any script or data that reproduces this behaviour and shows the distortion in action.

What do we want to find out?

Research question
How to reproduce energy filtering on Intel CPUs and how does the distorted RAPL data look like?

Finding an SGX enabled or energy filtering enabled machine

We first tried all the machines that we had lying around that according to Intel hat SGX on the chip or activateable through Intel ME:

We then resorted to going to the cloud, as there are way more CPUs availabe to us then we have at home. Sadly (at least for our case :) ) SGX is usually always disabled in cloud environments. AWS even rolls it’s own enclave called Nitro Enclaves.

However, if you rent a bare metal EC2 machine (either the .metal or the largest option) you get access to the CPU registers and can set the energy filtering flag.

Checking and activating energy filtering

It is suprisingly easy to turn energy filtering on. We have put all the scripts in our Tools repository.

Here you find the simple command line code (sudo rdmsr -d 0xbc) to check the register 0xbc if energy filtering is active. A 0 returned means it is off. A 1 returned means it is on.

By issueing a wrmsr command like that: sudo wrmsr 0xbc 1 energy filtering can be turned on.

Also if you want to check if SGX is active we have copied the C code to check for SGX into our repository, which is originally from ayeks.

Results: Looking at energy filtering signal distortion

The results show our runs by checking CPU RAPL energy consumption of Package 0 and Package 1 (we have a two chip machine) on idle for 5 Minutes with our low overhead MSR RAPL checking reporter.

Package 0 RAPL in EC2 m5.metal on idle. Energy filering OFF
Package 1 RAPL in EC2 m5.metal on idle. Energy filering OFF
Package 0 RAPL in EC2 m5.metal on idle. Energy filering ON
Package 1 RAPL in EC2 m5.metal on idle. Energy filering ON

Raw data:

  • [ec2_m5.metal_idle_p0_energy_filtering_off.csv]({{- “files/ec2_m5.metal_idle_p0_energy_filtering_off.csv” | relLangURL -}})
  • [ec2_m5.metal_idle_p1_energy_filtering_off.csv]({{- “files/ec2_m5.metal_idle_p1_energy_filtering_off.csv” | relLangURL -}})
  • [ec2_m5.metal_idle_p0_energy_filtering_on.csv]({-< “files/ec2_m5.metal_idle_p0_energy_filtering_on.csv” | relLangURL -}})
  • [ec2_m5.metal_idle_p1_energy_filtering_on.csv]({-< “files/ec2_m5.metal_idle_p1_energy_filtering_on.csv” | relLangURL -}})

We tested the system only on idle and as we can cleary see in the graphs the signal is (apart from three small outlier spikes) very close to the mean.

The distorted signal with energy filtering turned on is not only extremely noise and has a very high variance, it also has suprisingly a higher mean than the non-filtered signale.

We would have expected that at least the mean over a longer period of time would stay the same … but maybe 5 Minutes are not enough to get a solid average.

The results are so strong in effect that even this setup, which allows only for qualitative conclusion is already sufficient to say that an active energy filtering results in an unusable signal. Maybe even the average over a long time might not be of any use … but this needs further investigation.

The take away for us is to incorporate guard clauses in all our tools to check for this feature and abort any measurements with an error if active.