I’m trying to run the GPU for testing purposes (I’m using gpu-burn for this task) but I have observed that in tegrastatsGR3D_FREQ reaches 99% but the GPU and CV temps remain at -256C.
I looked this issue up on these forums and found answers like this which explain that the reason for the -256C reading is because the GPU is power gated and when not in use the power is shut off which means there is no reading from the temperature sensors. I see recommendations to use the “Jetson Power GUI” to turn on the GPU power.
One issue: I am accessing this Orin over SSH, and it has no monitor or keyboard directly attached. Is there a tool or way to modify the power management so that the GPU is turned on and I can see the temperature readings?
An additional side question: I don’t understand how the GPU can be “off” but GR3D_FREQ is at 99%. Does the power only disconnect the temperature sensors, not the GPU itself?
Thank you for any clarification or assistance on this issue!
I’ve tried to verify it locally.
It seems matrixMul runs too fast so that tegrastats can’t sample it.
Please try using while 1 to run it in loop.
I could see both frequency for GR3D_FREQ and temperature for GPU up.
Apologies but I am a little bit lost. When you say to use “while 1” to run “it” in a loop, what is it that I should be running in a loop? gpu-burn? Or some other tool?
I tried to run matrixMul in cuda-sample to check if there’re the values for GR3D_FREQ and GPU temperature.
If I just run matrixMul once, I can’t get the values and it may be caused from it runs too fast and tegrastats can’t sample it.
So, I write a script to run it in an infinite loop. And I get the expected results.
Ah ok. I was using gpu-burn not matrixMul, but regardless I tried running a loop of matrixMul and the GPU temperature did show up, so I’m wondering if the issue is with gpu-burn. Are you able to test and see if the GPU temperature doesn’t show up? In both cases, the GR3D_FREQ value increases.
I still don’t understand how GR3D_FREQ can show up as 99% (implying the GPU is under load and running) but the GPU temperature is at -256. This is very confusing.
git clone https://github.com/wilicc/gpu-burn
cd gpu-burn
make
And then run the command:
./gpu_burn -m 40% 60
However, now I am seeing GPU temperatures appear in tegrastats. I am wondering if I was failing to run sudo for tegrastats and that is the cause? I thought I had done that before but perhaps not. That may be the whole solution to this issue 😓️.