Quiz: HPC Power Management (Episodes 0 & 1)

Episode 0: Power Management Hardware Knobs - Fundamentals and Concepts

Multiple choice, single answer

Power Physics Fundamentals:

What is the primary component of CPU power consumption that scales with frequency?

  • A) Leakage power only

  • B) Dynamic power (CV²f)

  • C) Thermal power

  • D) I/O power

According to the power equation P = CV²f + I_leak·V, which change provides the greatest power savings?

  • A) Reducing frequency by 20%

  • B) Reducing voltage by 20%

  • C) Reducing capacitance by 20%

  • D) Increasing leakage current by 20%

Why is dynamic voltage and frequency scaling (DVFS) more effective at lower frequencies?

  • A) Lower frequencies consume less energy per operation

  • B) Power savings are quadratic with voltage reduction at lower frequencies

  • C) Cooling is more efficient

  • D) Memory bandwidth increases

CPU Frequency Scaling (P-states):

What is the primary purpose of P-states in CPU power management?

  • A) Control CPU temperature

  • B) Select the frequency and voltage for CPU cores

  • C) Manage idle power consumption

  • D) Coordinate with GPU frequency

Intel processors typically support how many P-states?

  • A) 2-5

  • B) 10-15

  • C) 20-40

  • D) 100+

What does P0 represent in the P-state hierarchy?

  • A) Minimum frequency

  • B) Turbo boost frequency

  • C) Baseline/nominal frequency

  • D) Shutdown state

Idle Power Management (C-states):

Which C-state represents the CPU actively executing instructions?

  • A) C0

  • B) C1

  • C) C2

  • D) C3

Approximately what percentage of power can be saved by transitioning from C0 to C3?

  • A) 5-10%

  • B) 20-30%

  • C) 50%+

  • D) 100%

What is the trade-off when using deeper C-states (C3+)?

  • A) Higher frequency requirements

  • B) Increased latency to wake up and resume execution

  • C) More power consumption

  • D) Inability to access memory

Thermal and System Power Management:

What are T-states used for in CPU power management?

  • A) Temperature sensing

  • B) Reducing frequency during thermal stress (thermal throttling)

  • C) Enabling turbo boost

  • D) Managing core voltage

What does S5 represent in the system power state (S-state) hierarchy?

  • A) System running state

  • B) System suspended

  • C) System fully powered off

  • D) System in thermal shutdown

ACPI and Scaling Drivers:

Which of the following is NOT a scaling driver mentioned in Episode 0?

  • A) acpi-cpufreq

  • B) intel_pstate

  • C) amd-cpufreq

  • D) gpu-pstate

What is the primary role of a scaling governor?

  • A) Measure CPU temperature

  • B) Decide which P-state (frequency) to use based on workload conditions

  • C) Manage system fans

  • D) Control memory frequency

Why Power Management Matters in HPC:

What is the typical power consumption percentage for a data center (facility-level)?

  • A) 5-10% of operational costs

  • B) 20-30% of operational costs

  • C) 50-70% of operational costs

  • D) 90%+ of operational costs

Approximately what fraction of power in a large HPC system goes to cooling?

  • A) 10%

  • B) 25%

  • C) 50%

  • D) 75%

Which of the following is NOT a benefit of power management in HPC?

  • A) Reduced electricity costs

  • B) Improved multi-tenancy isolation

  • C) Guaranteed faster execution time

  • D) Decreased environmental impact

Conceptual questions

Power Equation Analysis: A processor running at 2.0 GHz with voltage 0.8V consumes P₁ Watts of dynamic power. If you reduce frequency to 1.6 GHz and can also reduce voltage to 0.7V, calculate the power reduction ratio P_new/P₁. Assume capacitance C remains constant. Show all steps and explain why voltage scaling is crucial.

Workload and Power Interaction: Consider two workloads: (1) an all-reduce collective communication pattern (memory-bound) and (2) a dense matrix multiplication (compute-bound). For each workload, explain:

  • Whether frequency reduction will hurt performance significantly

  • Why it might be safe/unsafe to reduce frequency

  • What monitoring would you use to validate your strategy

Power Management Strategy Design: You are responsible for power management on a 1000-node HPC cluster where 30% of jobs are batch (deadline-loose), 50% are interactive (need low latency), and 20% are GPU-accelerated (compute-intensive). Design a power management strategy that:

  • Identifies which power management knobs to use

  • Proposes different settings for each job type

  • Explains trade-offs between performance and energy

Episode 1: Power Management Implementation and Runtime Systems

Multiple choice, single answer

Scaling Drivers and Interfaces:

Which driver provides more responsive CPU frequency control on modern Intel processors?

  • A) acpi-cpufreq

  • B) intel_pstate

  • C) ondemand-governor

  • D) They are equivalent

What does the max_perf_pct parameter in intel_pstate sysfs control?

  • A) Maximum percentage of cores to activate

  • B) Maximum allowed P-state as a percentage of maximum frequency

  • C) Maximum percentage of memory bandwidth

  • D) Maximum core temperature

To disable turbo boost on an intel_pstate system, which sysfs file should be written to?

  • A) scaling_max_freq

  • B) turbo_boost

  • C) no_turbo

  • D) turbo_disabled

Scaling Governors (Policies):

Which scaling governor always runs at maximum frequency?

  • A) powersave

  • B) performance

  • C) ondemand

  • D) conservative

What is the main advantage of the ondemand governor compared to conservative?

  • A) Lower power consumption

  • B) Faster response to load increases

  • C) Better for real-time systems

  • D) Requires less configuration

Which governor allows direct user/application control of CPU frequency?

  • A) performance

  • B) powersave

  • C) userspace

  • D) ondemand

In the ondemand governor, what does up_threshold control?

  • A) CPU utilization threshold for scaling frequency up

  • B) Maximum frequency cap

  • C) Minimum frequency floor

  • D) Temperature threshold

Hardware Frequency Control (MSR):

What is the MSR (Model-Specific Register) address for IA32_PERF_CTL?

  • A) 0x199

  • B) 0x770

  • C) 0x610

  • D) 0x620

Which bit field in MSR 0x199 specifies the target P-state?

  • A) Bits [7:0]

  • B) Bits [15:8]

  • C) Bits [31:16]

  • D) Bits [63:32]

Intel Turbo Boost:

What is the key difference between SSE and AVX-512 boost frequencies?

  • A) AVX-512 has the highest boost frequency

  • B) SSE has the highest boost frequency due to lower power density

  • C) They are identical

  • D) AVX-512 doesn’t support boost

Why does Intel implement instruction-set-specific frequency levels?

  • A) To comply with firmware limitations

  • B) To allow higher compute throughput within power and thermal budgets

  • C) To maintain thermal stability

  • D) To prevent privilege escalation

Frequency Transition Latency:

What is a typical frequency transition latency on modern Intel processors?

  • A) 1-5 microseconds

  • B) 5-20 microseconds

  • C) 100-500 microseconds

  • D) 1-10 milliseconds

For which type of application is frequency transition latency most critical?

  • A) Batch HPC jobs (hours-long)

  • B) Real-time embedded systems

  • C) General server workloads

  • D) Interactive web applications

GPU Frequency Management:

What command-line tool is used to control NVIDIA GPU frequency?

  • A) rocm-smi

  • B) nvidia-smi

  • C) gpu-control

  • D) pstate-set

Which AMD tool is used for GPU frequency management on AMD GPUs?

  • A) nvidia-smi

  • B) rocm-smi

  • C) amd-power

  • D) frequency-control

What is the typical frequency granularity for NVIDIA GPUs?

  • A) 1 MHz

  • B) 5-10 MHz

  • C) 25-50 MHz

  • D) 100+ MHz

Hardware P-State (HWP) and SpeedShift:

How does HWP differ from OS-controlled frequency scaling?

  • A) HWP is slower

  • B) Hardware autonomously selects P-states within OS-specified range

  • C) HWP only works for memory

  • D) They are the same thing

What is the approximate latency improvement of HWP over OS control?

  • A) 2-5× faster

  • B) 5-10× faster

  • C) 10-100× faster

  • D) HWP is actually slower

Energy-Performance Preference (EPP):

What does MSR IA32_ENERGY_PERF_BIAS (0x1B0) allow?

  • A) Setting exact frequency values

  • B) Specifying energy-performance trade-off preference (0-15 scale)

  • C) Disabling frequency scaling

  • D) Monitoring power consumption

On modern Intel (Skylake+), which MSR provides finer-grained EPP control with 0-255 scale?

  • A) 0x199

  • B) 0x620

  • C) 0x774

  • D) 0x1B0

CPU Uncore Frequency:

Approximately what percentage of CPU chip area does the uncore subsystem consume?

  • A) 10%

  • B) 20%

  • C) 30%

  • D) 50%

What is MSR MSR_UNCORE_RATIO_LIMIT used for?

  • A) Controlling core frequency only

  • B) Setting limits on uncore (shared subsystem) frequency

  • C) Measuring power consumption

  • D) Detecting thermal issues

Workload Characterization:

Which workload type would most benefit from frequency reduction without performance loss?

  • A) Dense matrix multiplication

  • B) Sparse linear solver (memory-latency-bound)

  • C) Real-time signal processing

  • D) Latency-sensitive trading system

What metric helps predict whether a workload is compute-bound or memory-bound?

  • A) Latency

  • B) Throughput

  • C) Arithmetic intensity (operations per memory access)

  • D) Frequency

Intel RAPL Power Capping:

How many power domains does Intel RAPL typically support?

  • A) 1-2

  • B) 2-3

  • C) 3-5

  • D) 5+

Which RAPL domain is specific to server architectures?

  • A) Package

  • B) Core (PP0)

  • C) DRAM

  • D) Graphics (PP1)

What are the two time windows in Intel RAPL Package domain?

  • A) 1 ms and 10 ms

  • B) Short (~1.2× TDP, ms) and Long (~TDP, seconds)

  • C) 1 second and 10 seconds

  • D) 100 ms and 1 second

What does MSR MSR_PKG_POWER_LIMIT (0x610) control?

  • A) Current power consumption

  • B) Maximum and minimum frequency

  • C) Power capping limits and time windows

  • D) Thermal shutdown temperature

Case Studies and Advanced Platforms:

In the Cascade Lake case study, what percentage CPU energy savings was achieved with frequency scaling?

  • A) 5-10%

  • B) 18%

  • C) 30%

  • D) 50%

Why is power management challenging on Grace Hopper?

  • A) No frequency scaling available

  • B) Multiple power domains (CPU, GPU, interconnects) require coordination

  • C) GPU power cannot be measured

  • D) Frequency is fixed at 2.2 GHz

On RIKEN Fugaku’s A64FX, what does FPU elimination in ECO mode do?

  • A) Disables floating-point calculations entirely

  • B) Reduces frequency by 50%

  • C) Uses one of two FPU pipelines only, reducing power

  • D) Moves computations to GPU

Runtime Systems and Strategies:

What does a power-capping runtime system do?

  • A) Forces all jobs to use the same frequency

  • B) Ensures total node power doesn’t exceed a limit while maximizing performance

  • C) Measures power for accounting purposes only

  • D) Disables turbo boost

Which power management strategy is most suitable for a tightly power-constrained HPC facility?

  • A) Fixed frequency for all jobs

  • B) Per-application tuning

  • C) Dynamic runtime control with power budgeting

  • D) No power management (always maximum)

What is the typical energy savings range for dynamic runtime power management?

  • A) 5-10%

  • B) 10-20%

  • C) 20-40%

  • D) 50%+

Coding and analysis questions

MSR-Based Frequency Control: The IA32_PERF_CTL register (MSR 0x199) controls CPU frequency. Assume a system has P-states 0-39, where:

  • P0 = 3.8 GHz (turbo)

  • P1 = 3.6 GHz (nominal)

  • P39 = 0.8 GHz (minimum)

Given that target P-state is specified in bits [15:8]:

  • a) Write the MSR value in hex to set CPU to P24 (2.0 GHz)

  • b) Explain how to read current frequency from P-state number

  • c) Design pseudocode to linearly scale frequency from current to 1.8 GHz over 10 steps

Scaling Governor Selection: You have three workloads:

  1. Scientific simulation: CPU-intensive, 8-hour runtime, deadline = 9 hours

  2. Data processing pipeline: 30% communication, 70% compute, continuously running

  3. Interactive visualization: Variable workload, < 100ms latency requirement

For each workload:

  • a) Recommend a scaling governor (performance, powersave, ondemand, conservative, userspace)

  • b) Justify your choice with reasoning about workload characteristics

  • c) Propose specific tuning parameters (e.g., up_threshold, sampling_rate)

Power Equation Application: A CPU core operates at V = 0.9V, f = 2.5 GHz with capacitance C = 100 pF. Dynamic power = CV²f ≈ 56W. Leakage power ≈ 4W. Total P = 60W.

  • a) If you reduce to f = 2.0 GHz and V can be scaled to 0.8V, calculate new power

  • b) What is the energy savings per 1-hour job?

  • c) At $0.10/kWh, what is the annual cost savings if running 500 jobs/day?

Intel RAPL Analysis: Assume you read the following RAPL MSR values:

Measurement

Value

MSR_RAPL_POWER_UNIT (0x606)

0xA1003

MSR_PKG_ENERGY_STATUS (0x611) at t₀ = 0s

0x2A5B0E00

MSR_PKG_ENERGY_STATUS (0x611) at t₁ = 60s

0x3F7F0E00

Using MSR format: energy_unit = 2^(-bit_position) Joules

  • a) Decode MSR_RAPL_POWER_UNIT to get energy unit in Joules

  • b) Calculate energy consumed between t₀ and t₁

  • c) Calculate average power during this interval

  • d) If power was capped at 200W, was it exceeded?

HWP vs OS Control Comparison: You are optimizing a 10-second latency-sensitive application.

  • Scenario A: OS-controlled frequency, 50 μs per frequency change, 4 changes needed

  • Scenario B: HWP hardware-controlled, 2 μs per P-state selection, 4 changes needed

  • a) Calculate total latency overhead for each scenario

  • b) Compute performance impact as percentage of 10-second deadline

  • c) Explain why HWP is preferable for this application

Case Study Analysis - Cascade Lake: From Episode 1, Cascade Lake achieves 18% CPU energy savings with frequency scaling on AVX-512 workloads at arithmetic intensity 8.

  • a) If a node’s baseline power is 100W CPU + 50W memory, calculate new CPU power with 18% savings

  • b) Total node power = 350W baseline. Assume 15% node savings. Calculate new total node power

  • c) For 1000 nodes running 24/7, compute annual energy cost savings at $0.12/kWh

  • d) Estimate payback period if implementing power management costs $500,000 in software development

Runtime System Design - Power Budget Allocation: Design a runtime system that allocates a 500W power budget among 4 cores, each running different workload types:

Core

Workload Type

Compute Intensity

Baseline Power

Core 0

Compute-bound

8 ops/byte

60W

Core 1

Memory-bound

0.5 ops/byte

40W

Core 2

I/O-bound

0.1 ops/byte

30W

Core 3

Balanced

2 ops/byte

50W

  • a) Estimate how much frequency reduction each core can tolerate without significant performance loss

  • b) Design an algorithm to allocate the 500W budget to maximize throughput

  • c) Write pseudocode for dynamic reallocation if a core becomes idle

  • d) Propose how to handle thermal constraints (max 90°C core temperature)

Workload Characterization and Power Optimization: Given profile data for a Lattice Boltzmann Method (LBM) simulation:

Metric

Value

Memory accesses per 1000 cycles

450

Floating-point operations per 1000 cycles

1800

L3 cache hit rate

85%

Memory latency (cache miss)

200 cycles

Frequency

2.5 GHz

  • a) Calculate arithmetic intensity (flops per byte accessed)

  • b) Is this workload compute-bound or memory-bound? Justify.

  • c) Estimate the performance impact of 20% frequency reduction

  • d) Propose a power management strategy specific to this workload type