Activating GPU Power Rails on AGX Orin without a GUI

mason15 · May 3, 2024, 3:44pm

Hello,

I’m trying to run the GPU for testing purposes (I’m using gpu-burn for this task) but I have observed that in tegrastats GR3D_FREQ reaches 99% but the GPU and CV temps remain at -256C.

I looked this issue up on these forums and found answers like this which explain that the reason for the -256C reading is because the GPU is power gated and when not in use the power is shut off which means there is no reading from the temperature sensors. I see recommendations to use the “Jetson Power GUI” to turn on the GPU power.

One issue: I am accessing this Orin over SSH, and it has no monitor or keyboard directly attached. Is there a tool or way to modify the power management so that the GPU is turned on and I can see the temperature readings?

An additional side question: I don’t understand how the GPU can be “off” but GR3D_FREQ is at 99%. Does the power only disconnect the temperature sensors, not the GPU itself?

Thank you for any clarification or assistance on this issue!

KevinFFF · May 6, 2024, 3:49am

Hi mason15,

Are you using the devkit or custom board for AGX Orin?
What’s your Jetpack version in use?

Please also share the result of “sudo tegrastats” for further check.

mason15 · May 13, 2024, 9:41pm

Hi Kevin!

Are you using the devkit or custom board for AGX Orin?

This is a custom board using the Orin Industrial SOM.

What’s your Jetpack version in use?

I ran:

dpkg-query --show nvidia-l4t-core

And got:

nvidia-l4t-core	35.5.0-20240219203809

Please also share the result of “sudo tegrastats” for further check.

05-10-2024 11:54:03 RAM 3274/54718MB (lfb 10932x4MB) SWAP 0/27359MB (cached 0MB) CPU [2%@729,0%@729,0%@729,6%@729,0%@729,0%@729,0%@729,0%@729,0%@1497,0%@1497,0%@1497,13%@1497] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] VIC_FREQ 921 APE 174 CV0@-256C CPU@52.656C Tboard@42C SOC2@49.125C Tdiode@42.5C SOC0@50.687C CV1@-256C GPU@-256C tj@52.562C SOC1@50.312C CV2@-256C VDD_GPU_SOC 2154mW/2154mW VDD_CPU_CV 718mW/718mW VIN_SYS_5V0 7862mW/7862mW NC 0mW/0mW VDDQ_VDD2_1V8AO 796mW/796mW NC 0mW/0mW

KevinFFF · May 17, 2024, 3:22am

I’ve tried to verify it locally.
It seems matrixMul runs too fast so that tegrastats can’t sample it.
Please try using while 1 to run it in loop.
I could see both frequency for GR3D_FREQ and temperature for GPU up.

mason15 · May 17, 2024, 1:55pm

Apologies but I am a little bit lost. When you say to use “while 1” to run “it” in a loop, what is it that I should be running in a loop? gpu-burn? Or some other tool?

KevinFFF · May 20, 2024, 2:14am

I tried to run matrixMul in cuda-sample to check if there’re the values for GR3D_FREQ and GPU temperature.
If I just run matrixMul once, I can’t get the values and it may be caused from it runs too fast and tegrastats can’t sample it.
So, I write a script to run it in an infinite loop. And I get the expected results.

Please also try to verify with cuda-sample.

mason15 · May 22, 2024, 9:34pm

Ah ok. I was using gpu-burn not matrixMul, but regardless I tried running a loop of matrixMul and the GPU temperature did show up, so I’m wondering if the issue is with gpu-burn. Are you able to test and see if the GPU temperature doesn’t show up? In both cases, the GR3D_FREQ value increases.

I still don’t understand how GR3D_FREQ can show up as 99% (implying the GPU is under load and running) but the GPU temperature is at -256. This is very confusing.

KevinFFF · May 23, 2024, 8:04am

Yes, it seems not the expected result to us.

Please share the steps how you run gpu-burn.

mason15 · May 23, 2024, 7:15pm

Here is the setup I have been using:

Ensure libcublas and g++ is available:

sudo apt install cuda-toolkit-11-4 g++ -y

Clone the repository and build gpu-burn:

git clone https://github.com/wilicc/gpu-burn
cd gpu-burn
make

And then run the command:

./gpu_burn -m 40% 60

However, now I am seeing GPU temperatures appear in tegrastats. I am wondering if I was failing to run sudo for tegrastats and that is the cause? I thought I had done that before but perhaps not. That may be the whole solution to this issue 😓️.

KevinFFF · May 27, 2024, 2:39pm

yes, you should use sudo for tegrastats.

system · June 19, 2024, 6:06am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
AGX Orin: tegrastats reports GPU@-256C Jetson AGX Orin tools , nvbugs	3	1033	April 5, 2023
JetPack 6: tegrastats missing GPU temperature Jetson AGX Orin kernel	10	119	November 15, 2024
Sudo tegrastats cannot get GR3D info on Orin Agx Jetson AGX Orin kernel	5	317	October 27, 2023
The Steady-State GPU Temperature on the Jetson AGX Orin is Higher Than in Previous Tests Jetson AGX Orin hw	10	164	July 24, 2024
GPU usage rises and frames drop after hot-surface-alert (or other popup alerts) appear Jetson AGX Orin	5	47	November 20, 2024
Jtop doesn't detect GPU temperature consistently Jetson AGX Orin thermal	5	1254	August 23, 2023
Jetson Power GUI GPU Variables Jetson AGX Orin power_estimator	3	185	April 15, 2024
Tegrastats shows some GPU power consumption even when nothing is running on it(CV does not) Jetson AGX Xavier	10	1157	October 18, 2021
System shutdown exception Jetson AGX Orin kernel , board-design , power , chinese	42	496	May 20, 2024
AGX Orin: tegrastats reports GPU@-256C Jetson AGX Orin kernel , chinese	7	33	July 30, 2024

Activating GPU Power Rails on AGX Orin without a GUI

Related topics