Power Management

Comprehensive guide to hardware power management capabilities and techniques for optimizing energy consumption in HPC systems. Learn how to configure CPU frequency scaling, idle power states, and runtime power management systems to balance performance and energy efficiency.

Prerequisites

  • Basic understanding of HPC architecture and CPU fundamentals

  • Familiarity with Linux command-line tools and sysfs interface

  • Knowledge of system administration and performance monitoring

  • Understanding of basic power/energy concepts (Watts, Joules)

Description

This module provides a comprehensive introduction to hardware power management in HPC systems. It explores the mechanisms that modern CPUs provide to control power consumption—from frequency scaling to idle states to hardware-managed performance optimization.

Episode 0: Power Management Hardware Knobs covers the fundamental concepts of power management. Topics include the physics of power consumption (the P=CV²f relationship), Dynamic Voltage and Frequency Scaling (DVFS), CPU performance states (P-states), idle power saving states (C-states), thermal throttling (T-states), and shutdown states (S-states). The episode explains why power management matters for HPC: cost, performance constraints, reliability, and sustainability.

Episode 1: Power Management Implementation and Runtime Systems provides technical deep-dives into the Linux interfaces and mechanisms that enable power management. It covers scaling drivers (intel_pstate, acpi-cpufreq), scaling governors (performance, powersave, ondemand, conservative, userspace), MSR register-level frequency control, Intel Turbo Boost technology, Hardware P-State (HWP/SpeedShift) for autonomous frequency selection, GPU frequency management, frequency transition latency, and practical runtime systems for automatic power optimization.

Course Topics

  • Power-Performance Trade-off: Understanding the physics (P=CV²f+I_leak)

  • Dynamic Voltage and Frequency Scaling (DVFS): How modern CPUs adjust power

  • Performance States (P-states): Frequency-voltage pairs from turbo to minimum

  • Idle Power States (C-states): Sleep states and their energy efficiency

  • Thermal Throttling (T-states): Automatic frequency reduction under thermal stress

  • ACPI Standard: Industry-standard power management framework

  • Linux Scaling Drivers: intel_pstate vs acpi-cpufreq interfaces

  • Scaling Governors: Policies for automatic frequency selection (ondemand, powersave, etc.)

  • MSR Registers: Direct hardware frequency control mechanisms

  • Intel Turbo Boost: Opportunistic frequency boosting beyond nominal

  • Hardware P-State (HWP): Autonomous hardware frequency management

  • GPU Frequency Scaling: NVIDIA and AMD GPU power control

  • Runtime Power Management Systems: Automatic optimization frameworks

Target Audience

Level: Intermediate to Advanced

Prerequisites: Comfortable with Linux systems, basic Python, understanding of CPU architectures, and familiarity with sysfs and /proc filesystems. System administration experience is helpful but not required.

Language: English

Technical Requirements

  • Linux system with frequency scaling support (intel_pstate or acpi-cpufreq driver)

  • Root or power management group access for sysfs writes

  • Python 3.7+ with NumPy for analysis (optional)

  • Access to HPC system or multi-core workstation for experiments

Instructors

Ondrej Vysocky is a senior researcher at the Infrastructure Research Laboratory within the IT4Innovations National Supercomputing Center. His work primarily focuses on the reduction of the energy consumption of supercomputers to lower operating costs, achieving annual savings in the millions of crowns and a significant reduction in the carbon footprint of computations.

Learning outcomes

This module prepares HPC practitioners, system administrators, and performance engineers to understand and optimize power consumption through hardware power management techniques.

By the end of this module, learners should be able to:

  • Understand power physics: Explain the relationship between voltage, frequency, and power consumption (P=CV²f), and discuss why power becomes a performance-limiting constraint

  • Navigate power management hierarchy: Explain P-states (performance), C-states (idle), T-states (thermal), and S-states (shutdown) and their appropriate use cases

  • Control CPU frequency: Use Linux sysfs interfaces to query and adjust CPU frequency scaling parameters

  • Select appropriate scaling governors: Choose between performance, powersave, ondemand, conservative, and userspace governors based on workload characteristics

  • Interpret frequency scaling data: Read frequency information from sysfs and understand MSR register-level frequency control

  • Apply Turbo Boost effectively: Understand the relationship between instruction set (SSE vs AVX-512) and turbo frequency, and decide when to enable/disable turbo

  • Leverage Hardware P-State: Understand how HWP (SpeedShift) improves frequency management responsiveness and explain its benefits over OS-controlled scaling

  • Manage GPU frequency: Use vendor tools (nvidia-smi, rocm-smi) to monitor and control GPU frequency scaling

  • Design power management strategies: Create appropriate power management policies for different workloads (batch, interactive, accelerated)

  • Implement runtime power systems: Deploy automatic power management systems that balance energy and performance constraints

See also

Credit

FIXME

Don’t forget to check out additional course materials from CASTIEL Energy Efficient Computing webinar series.

License

Note

To module authors: For code you may use any OSI-approved license as mentioned in https://spdx.org/licenses/, such as Apache License 2.0, GNU GPLv3, MIT. Please make sure to update the deed above and LICENSE.code file accordingly.