[Jetson-TK1, JetPack 1.2] (nvprof / CUPTI) breaks power gating and causes excessive power usage

Hi all,

Let me first emphasize that this is kind of a complex and low-level bug. It is very easy to reproduce however, and unfortunately, stops me from continuing my research. Any help/discussion is helpful.

In short, I have a Jetson-TK1 development kit where I am profiling GPU CUDA kernels and power usage to build a GPU power model. I am monitoring the GPU rail voltage with a voltmeter. Under normal conditions (executing CUDA kernels), the GPU rail is power-gated (turned off) after about 50 ms after the last CUDA activity. It is, of course, on while kernels are executing.

However - IF I execute my CUDA kernels under nvprof (a profiling tool which allows collection of performance counters, such as kernel duration, L2 reads/writes and total instructions executed) the GPU power rail is not turned off 50 ms after the last CUDA kernel launch. The rail just remains powered, at some X volts (visible on my voltmeter).

In addition, the power usage of platform in this “bugged” state is very high. In my tests I am controlling all frequencies (memory, CPU and GPU), while I am logging power usage. In the “bugged” state, IDLE POWER (no kernels executing, no CPU activity):

(GPU freq 852 MHz)
MEM FREQ [MHz] | 40.8 | 396 | 924
POWER [W]_____ | 3.84410461 | 4.012469366 | 4.250477808

Similarly, IDLE POWER in “normal” (not bugged) state, in the 50 ms period when the GPU remains powered after the last CUDA kernel launch:

(GPU freq 852 MHz)
MEM FREQ [MHz] | 40.8 | 396 | 924
POWER [W]_____ | 1.880853261 | 2.027946738 | 2.247936968

As you can clearly see, there seems to be a lot more going on inside the GPU, which is obviously drawing a lot of power, in the “bugged” state. And this is NOT due to any increased CPU activity, as I have forced the CPU to stay at 1.092 GHz on the LP core, and reading the ARM performance counters there are no more instruction executed per cycle in the “bugged” state than the “normal” state.

Puzzled. NVIDIA please help.

Would you please use the tegrastat monitor memory and dynamic frequency scaling statistics for tegra devices:

RAM 250/490MB (lfb 17x1MB) Carveout 12/12MB (lfb 0kB) GART 28/32MB (lfb 4MB) IRAM 36/256kB (lfb 128kb) | DFS(%@MHz): cpu [61%,45%]@835 avp 17%@100 vde 34%@216 emc 25%@333

ubuntu@tegra-ubuntu:~$ tegrastats
RAM 61/1892MB (lfb 437x4MB) cpu [0%,off,off,off]@-1 VDE 0 EDP limit 0

The CUDA profiler disables GPU power management when collecting GPU performance counters.

The Tegra Graphics Debugger, CUPTI SDK, and PerfKit SDK all perform similar operations. This is true for both mobile and desktop tools. These tools are designed for performance profiling not for concurrent performance and power profiling.

If this is a required feature I recommend you file a bug through the NVIDIA registered developer program.