Instructor guide

Why we teach this lesson

Power management is essential for modern HPC because:

  1. Cost Reduction: Energy costs are 20-30% of HPC operational budgets. Understanding power management enables targeted cost reduction through frequency scaling and power capping—often yielding 10-30% power savings without sacrificing application performance.

  2. Performance Is Power-Limited: Modern CPUs and accelerators are power-limited devices. You cannot run all cores at maximum frequency indefinitely—thermal and power delivery constraints enforce a power envelope. Understanding this relationship is critical for performance modeling and prediction.

  3. Thermal Management: Data centers operate at capacity limits for cooling. Poor power management leads to thermal throttling, system failures, and operational issues. Students must understand how frequency scaling prevents thermal emergencies.

  4. System Reliability: Power delivery infrastructure and thermal systems are reliability bottlenecks. A 10% reduction in average power reduces cooling load by ~200 kW in large data centers—avoiding infrastructure upgrades and improving MTBF.

  5. Heterogeneous Architectures: Modern HPC systems (CPU+GPU, multiple CPU types) require heterogeneous power strategies. Students need to understand how different components interact in power-constrained environments.

  6. Regulatory and Sustainability: Increasing environmental regulations and carbon footprint accountability make power efficiency a business requirement. Understanding power management is now a professional skill for HPC practitioners.

Timing

Episode 0: Power Management Hardware Knobs

  • Power-Performance trade-off physics: 15 min (with equations and intuition)

  • CPU frequency scaling (DVFS) fundamentals: 15 min

  • P-states, C-states, T-states, S-states overview: 20 min (with examples)

  • ACPI standard and OS role: 10 min

  • Why power management matters: 10 min

  • Total: ~70 minutes

Episode 1: Implementation and Runtime Systems

  • Scaling drivers (intel_pstate vs acpi-cpufreq): 15-20 min

  • Scaling governors (performance, powersave, ondemand, conservative): 20-25 min

  • MSR-level frequency control: 10 min

  • Intel Turbo Boost: 10-15 min

  • Hardware P-State (HWP): 10-15 min

  • GPU frequency management: 10 min

  • Frequency transition latency and practical implications: 10 min

  • Runtime power management systems: 15 min

  • Total: ~100-110 minutes

Quiz and Exercises: 60-90 minutes depending on depth

Recommended structure: Teach Episode 0 on Day 1 (fundamentals), Episode 1 on Day 2 (implementation), with hands-on exercises on actual systems.

Hardware requirements

Minimum Setup for Demonstrations:

  • Linux workstation or HPC compute node with:

    • Intel (Haswell+) or AMD (Zen+) CPU

    • Linux kernel 4.0+ with cpufreq driver enabled

    • 4+ GB RAM

    • 10 GB disk space

Essential Access:

  • Root or group permissions for /sys/devices/system/cpu/*/cpufreq/ writes

  • Ability to read MSR registers (may require kernel module)

  • cpupower or turbostat tools available

Optional but Recommended:

  • Intel Xeon Scalable or AMD EPYC processor (for realistic HPC hardware)

  • Multiple sockets/cores for demonstrating heterogeneous frequency management

  • NVIDIA or AMD GPU for accelerator frequency scaling demos

  • Thermal monitoring (lm-sensors) for demonstrating thermal throttling

Software Stack:

  • Linux kernel with intel_pstate or acpi-cpufreq driver

  • linux-tools package containing cpupower, turbostat

  • Python 3.7+ with NumPy for data analysis

  • Optional: perf, likwid for power measurement

Virtual Environment Compromise: If no appropriate hardware available:

  • Use pre-recorded sysfs traces from actual systems

  • Provide simulation of frequency switching with synthetic data

  • Use vendor documentation with concrete examples

  • Demo with frequency limiting on available CPU (even if limited range)

Preparing exercises

1 Week Before:

  1. Inventory your hardware:

    # Check CPU
    cat /proc/cpuinfo | grep "model name"
    
    # Check supported frequencies
    cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
    cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
    
    # Check available governors
    cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
    
  2. Verify scaling driver:

    cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
    
  3. Test sysfs access:

    # Ensure you can read and write
    cat /sys/devices/system/cpu/intel_pstate/max_perf_pct
    echo 90 > /sys/devices/system/cpu/intel_pstate/max_perf_pct  # Test write
    
  4. Prepare measurement tools:

    sudo apt install linux-tools-generic  # Provides cpupower, turbostat
    
    # Install likwid for power measurement (if available)
    # Pre-compile or document installation procedure
    

Day Before:

  1. Set up clean baseline:

    # Reset to default power settings
    echo 0 > /sys/devices/system/cpu/intel_pstate/no_turbo
    echo 100 > /sys/devices/system/cpu/intel_pstate/max_perf_pct
    echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
    
  2. Create demonstration scripts:

    • Script to measure frequency vs power (run application at different frequencies)

    • Script to monitor thermal throttling

    • Script to compare governors

  3. Prepare example data:

    • Baseline frequency/power measurements for your system

    • Example traces showing frequency changes over time

    • Thermal throttling event logs

  4. Test all exercises in advance:

    • Run through Episode 0 hands-on activities

    • Test Governor switching on all CPUs

    • Verify turbostat output format

Day Of:

  1. System warmup: Boot 30 minutes early, run sustained load to establish thermal state

  2. Verify kernel parameters: Confirm all expected sysfs files exist and are readable

  3. Check frequency stability: Run measurement script and verify no throttling occurring

  4. Network setup: Ensure students can access sysfs or have pre-shared results

Other practical aspects

Teaching Approach:

  • Start with physics (P=CV²f) to build intuition before implementation details

  • Use real numbers: “Reducing frequency by 20% saves ~35% power” is concrete

  • Compare to real systems: “Your laptop does this—here’s how to see it”

  • Hands-on first, then explanation: Let students observe frequency changes via sysfs, then explain what they’re seeing

Lab Exercise Ideas:

  1. Frequency scaling exercise:

    • Run application at different frequencies

    • Measure time and energy consumption

    • Plot performance vs frequency

  2. Governor comparison:

    • Run identical workload with different governors

    • Compare frequency transitions and power usage

    • Discuss trade-offs

  3. Thermal throttling simulation:

    • Stress-test CPU to trigger throttling

    • Observe frequency drops via sysfs

    • Measure performance impact

  4. Power budget simulation:

    • Given a 100W budget for 4-socket node

    • Determine allowable frequency per-socket

    • Implement via sysfs settings

Engaging Discussion Topics:

  • “Why would you ever use the performance governor?” (Answer: when you have no power budget constraints)

  • “What happens if you set minimum frequency > maximum?” (Reveal common configuration error)

  • “Why do different instruction sets have different turbo frequencies?” (AVX-512 power intensity)

  • “Who controls frequency on your laptop vs supercomputer?” (OS vs administrator)

Interesting questions you might get

Q: “If lower frequency saves power, why not just run at 1 GHz always?”
A: Because frequency directly impacts execution time. A job that takes 1 hour at 3 GHz takes 3 hours at 1 GHz. If your deadline is tight or utilization is low (wasting rental charges), the energy savings get overwhelmed by performance loss. There’s always a trade-off.

Q: “Can we predict power from frequency alone?”
A: No. Power depends on: frequency, voltage, workload type (SSE vs AVX-512), memory bandwidth, I/O activity, temperature, and other factors. The P=CV²f equation is simplified; real CPUs have leakage power, memory subsystem power, etc.

Q: “Why does my CPU frequency keep changing when I run the same application?”
A: The ondemand governor is responding to observed utilization. If your application has I/O waits or irregular parallelism, frequency bounces around. This is normal and usually fine, but can be problematic for latency-critical applications.

Q: “What’s the difference between intel_pstate and acpi-cpufreq?”
A: intel_pstate is newer, firmware-independent, with finer control. acpi-cpufreq is older, relies on ACPI firmware tables, more portable. Most modern Intel systems use intel_pstate.

Q: “Does limiting frequency affect reliability?”
A: No—if anything, it improves reliability by reducing heat/stress. You’re operating within the CPU’s safe power envelope, just at a lower level.

Q: “Can I change frequency of individual cores?”
A: Newer Intel CPUs (Skylake+) support per-core frequency with HWP. Older CPUs change frequency for all cores together. Check your CPU documentation.

Q: “What’s the relationship between power management and performance monitoring?”
A: Power management changes CPU frequency, which affects performance counter interpretation. A 2M cycle counter at 1 GHz vs 2 GHz represents different wall-clock time. Students must understand this relationship for accurate performance analysis.

Q: “Can we use power management for security (preventing side-channel attacks)?”
A: Potentially—variable frequency makes timing-based attacks harder. This is active research but not yet standard practice.

Typical pitfalls

Misconception 1: “Frequency reduction saves power proportionally”

  • The Problem: Students assume “50% frequency = 50% power”

  • Reality: Power ∝ V² × f, not linear. Reducing frequency by 50% requires voltage reduction, yielding ~60-70% power saving

  • How to catch it: Show mathematical derivation and empirical data

  • Teaching tip: Use concrete example: “Reducing frequency 3.0 GHz→2.0 GHz saves ~35% power”

Misconception 2: “The performance governor is always best”

  • The Problem: Students assume maximum frequency = best performance always

  • Reality: Performance governor wastes power during I/O and memory waits. For loosely coupled jobs, ondemand often delivers same performance with 15-20% power savings

  • How to catch it: Run actual measurements comparing governors on their workload

  • Teaching tip: Emphasize that performance and energy efficiency don’t always conflict

Misconception 3: “Turbo Boost is free frequency”

  • The Problem: Students think turbo comes without power cost

  • Reality: Turbo operates within the same power envelope as non-turbo. Running all cores at turbo requires either: (a) fewer active cores, or (b) reduced frequency when all cores boost

  • How to catch it: Ask “If all cores boost to 4.5 GHz, what’s the total power?” and lead to power limit reasoning

  • Teaching tip: Explain that turbo is a reallocation of power budget, not additional power

Misconception 4: “Thermal throttling means your CPU is broken”

  • The Problem: Students panic when they observe frequency drops under load

  • Reality: Thermal throttling is normal protection mechanism. It only triggers if cooling is genuinely insufficient

  • How to catch it: Distinguish between normal thermal management (fine) and persistent throttling (cooling insufficient)

  • Teaching tip: Show temperature vs frequency correlation; explain BIOS thermal limits

Misconception 5: “Changing frequency requires restarting the application”

  • The Problem: Students think frequency must be set before execution

  • Reality: Modern CPUs change frequency while running. Applications don’t need to be aware

  • How to catch it: Demo: change frequency via sysfs while application runs; show it responding to frequency change

  • Teaching tip: Use watch command to continuously display frequency while running application

Misconception 6: “HWP is slower than OS control”

  • The Problem: Students assume hardware control is less responsive

  • Reality: HWP responds in ~1 microsecond vs ~50 microseconds for OS. Hardware wins decisively

  • How to catch it: Show latency numbers; explain hardware has specialized logic

  • Teaching tip: Emphasize that HWP enables low-latency adaptation that OS control cannot achieve

Misconception 7: “Scaling governors are mutually exclusive”

  • The Problem: Students think you must pick one globally

  • Reality: Different cores can use different governors (with limitations). System-wide policies are typical but not enforced

  • How to catch it: Show sysfs structure: /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  • Teaching tip: Clarify that writing to /sys/devices/system/cpu/cpufreq/scaling_governor sets all CPUs, but individual control possible

Misconception 8: “Overclocking and frequency scaling are related”

  • The Problem: Students confuse DVFS with overclocking

  • Reality: DVFS operates within CPU specifications. Overclocking exceeds them (risky). Different concepts

  • How to catch it: Clarify: DVFS = manufacturer-supported, safe. Overclocking = pushing beyond spec, risky

  • Teaching tip: Emphasize that we’re teaching manufacturer-supported features only