Press the Turbo Button!

Did you know every one of your rack-mount servers has a literal turbo button, and you probably haven't pressed it?

First, some history of Turbo buttons throughout the decades

Back in the 1980s the very popular Intel 8086 and 8088 processors had a standard clock frequency of 4.77 MHz. This was consistent long enough that many games and even some device drivers were written to rely on this specific timing. In principle, the developers should have used the real time clock (RTC) like modern games do, but obtaining the real time back then was expensive, so a common “optimisation” was to rely on the CPU frequency only. Progress inevitably marched on and processor frequencies increased. Since game speeds were tied to the frequency, they sped up like a video tape on fast forward. Most such games became essentially unplayable. For the sake of backwards compatibility, PC manufacturers added a “Turbo” button on the front of the case that could be used to switch between the legacy 4.77 MHz frequency and whatever speed the ugpraded processors were actually capable of.

I have fond memories from my childhood of visiting friends and pressing the turbo button on their PCs. The shock on their faces when the computer they've had for years magically sped up by a factor of two or more was just priceless. There were a few holdouts however. I vividly remember one friend getting very upset that I had done something “bad” to his PC and he insisted that I undo “whatever I had done”. I quickly switched the turbo off because it looked like he was about to cry any second.

It's 2020 now – the actual future – and nothing has changed. There still a turbo button on all Intel-based PCs and servers, except instead of a physical button, it's all software controlled. Unfortunately, it's off by default, so I still get to run around gleefully pressing it. I still see many people hesitate to approve the change, despite the clear-cut benefits that are simply too good to ignore.

Largely, this is the “fault” of green computing initiatives such as the United States federal ENERGY Star program, as well as various local programs in Europe and California. Because so many jurisdictions require some form of energy-efficiency features, all manufacturers enable these features by default. They come by various names now, but it's most commonly referred to as ACPI P-States and C-States for CPUs, and generically “Power Management” or something similar for all other devices.

At first, this wasn't that big a deal. In the early 2000s Intel-based servers had only the SpeedStep feature, which reduced clocked speeds but kept all CPU cores operational at all times. Idle servers would use less power, but quickly returned to full speed if they had work to do. Also, server networking back then was typically 100 Mbps Ethernet and all disks where mechanical, so the additional wakeup delays were not that big a deal. Turning off the green power saving mode just increased your electricity bill and didn't do much of anything for application performance.

Things have changed since then: Server networking starts at 10 Gbps and SSDs with read latencies of a few tens of microseconds are commonplace. The wakeup time from a processor's deep sleep state is now an eterenity compared to the speed of the rest of the data centre environment. In principle, most server CPUs can switch between the various power-saving states in as little as a microsecond, but the real problem is with some newer ACPI C-States that allow processor cores to be entirely powered off. If a workload is assigned to such a core, then the wakeup time can be much higher. These wakeup times from deep sleep states are not well advertised, but can be on the order of a hundred microseconds or more. Additionally, no matter how fast the wakeup time, C3 and higher sleep states completely turn off the processor core, which forces its caches to flush and be reinitialised and resynchronized to the other cores when woken. This inherently takes significant time, no matter what. Additionally, most such power saving systems will not switch cores directly from deep sleep to the maximum clock speed, but will incrementally “ramp up” to faster and faster clock speeds.

In practice, wakeup times can be 50μs-200μs, and I regularly see server processors “stuck” at clock speeds as low as 1 GHz all day.

Pros and Cons of Power Management

My experience in the field is that turning off green power saving features can occsassionally double or even triple application performance, with a minimum expected boost of 20% in almost all cases. This is a an email that was sent to an IT manager at a customer where disabling power management was the only change made:

@@@

The typical improvement is about 30% and tends to be higher on servers with lighter workloads. Modern, high core-count virtualised cluster hosts tend to be ideal candidates for seeing large improvements, whereas overloaded servers see little or no benefit – but in that case power management does not save any power, so turning it off has no down side.

Look at it this way: Are you happy that you've purchased millions of dollars worth of server equipment, just to have nearly half of its capability thrown out the window for the want of a button press?

There's some “downsides” that aren't:

Inncreased power bills. Even if including the increased demand on data centre air conditioning, this amounts to approximately $150-$300 per server per year. Meanwhile, a server might cost north of $50K and have $100K of software running on it. Microsoft SQL Server Enterprise Edition is $7K per core, or just shy of half a million dollars for a modern server. Don't ask what Oracle costs.
The processors might wear out faster. The converse is likely the case, as constant power-cycling thermally stresses the delicate silicon structures of processors. Like car engines, CPUs can last longer if running at a steady clip.
Dynamic power management is the vendor default. They don't care about your cost efficiency. They care about getting requirements rubberstamped so that they can legally sell their product in nanny-states and jurisdictions with overzealous greenies. That's not even mentioning that if their servers run at half speed they'll happily sell you more stuff. Hilariously, neither vendor benchmarks nor third-party benchmarks will never reveal these issues because they run at maximum load by definition, forcing CPU power management off.
It's worse for the environment. Only if you think only electricity has an impact. Computers take an insane amount of resources to make. Running them at half speed means you need to make twice as many. That's not even counting the downstream costs of lost productivity waiting for slow servers. It makes sense to install LED lightbulbs. You get more light for less money. It does not make sense to slow servers down to half speed to save the same amount of electricity.

So what are the real downsides? There's only a few:

Turning off power management all at once could overload the data centre air conditioning. This however implies that your air conditioning was already borderline, which is a big problem that you should fix right away, before a load spike or simply the next Patch Tuesday literally melts your servers. After you've fixed this critical problem, feel free to turn power management off.
You might exceed per-chassis maximum power. Unlike individual rack mount servers, blade servers share power rails, and there are chassis-wide electrical and thermal maximums that are enforced. However, most such systems will simply forcefully throttle CPU speeds instead of overheating, putting you back to square one. The solution is to either purchase additional power supply modules to populate all available slots, or to intersperse non-production server blades with the production server blades.

Your lazy CPUs are killing your network performance

A common misconception is to expect CPU power management to affect CPU-intensive workloads such as batch processing or reporting workloads, but the exact opposite is true! A single expensive query or job will quickly spin up all CPU cores to their maximum speed and they will stay there the whole time. The green power-saving features have a totally negligible effect, adding microseconds to processes taking seconds or minutes.

Server-to-server networking is a very different story. I discovered ACPI-C states and their dramatic effects on performance around the time that 10 Gbps Ethernet starting seeing widespread adoption. It was often slower in practice than the 1 Gbps networking replaced! This was completely counter-intuitive, and I saw many large enterprise customers roll out millions of dollars worth of kit only to see a reduction in throughput. Even simple file copies that used to speed along at “wire rate” on 1 Gbps would drop to as little as 600 Mbps on theoretically 10x faster equipment!

The “lightbulb moment” for me was when I discovered that a simple busy loop along the lines of a while(true) {} snippet in a script would increase network throughput. I used to run this demo for clients to show how forcing the CPU clock higher with busywork would improve things.

@@@ 10 Gbps is worse than 1 Gbps! @@@ timeline of packets back and forth

How to press the turbo button

@@@ table of Dell, HPE, etc… Static Max Power @@@ VMware ACPI-C state screenshot @@@ Windows power management for Hyper-V or bare metal

@@@ Microseconds matter @@@ Brent ozoar's article @@@ VMware latency sensitivity tuning

http://www.danielwong.org/files/uDPM-HPCA2019.pdf

Press the Turbo Button!

January 09, 2020 Networking N-Tier Cloud Latency Performance

First, some history of Turbo buttons throughout the decades

Pros and Cons of Power Management

Your lazy CPUs are killing your network performance

How to press the turbo button

Related topics

January 09, 2020
Networking N-Tier Cloud Latency Performance