Activating GPU Power Rails on AGX Orin without a GUI

Hello,

I’m trying to run the GPU for testing purposes (I’m using gpu-burn for this task) but I have observed that in tegrastats GR3D_FREQ reaches 99% but the GPU and CV temps remain at -256C.

I looked this issue up on these forums and found answers like this which explain that the reason for the -256C reading is because the GPU is power gated and when not in use the power is shut off which means there is no reading from the temperature sensors. I see recommendations to use the “Jetson Power GUI” to turn on the GPU power.

One issue: I am accessing this Orin over SSH, and it has no monitor or keyboard directly attached. Is there a tool or way to modify the power management so that the GPU is turned on and I can see the temperature readings?

An additional side question: I don’t understand how the GPU can be “off” but GR3D_FREQ is at 99%. Does the power only disconnect the temperature sensors, not the GPU itself?

Thank you for any clarification or assistance on this issue!

Hi mason15,

Are you using the devkit or custom board for AGX Orin?
What’s your Jetpack version in use?

Please also share the result of “sudo tegrastats” for further check.

Hi Kevin!

Are you using the devkit or custom board for AGX Orin?

This is a custom board using the Orin Industrial SOM.

What’s your Jetpack version in use?

I ran:

dpkg-query --show nvidia-l4t-core

And got:

nvidia-l4t-core	35.5.0-20240219203809

Please also share the result of “sudo tegrastats” for further check.

05-10-2024 11:54:03 RAM 3274/54718MB (lfb 10932x4MB) SWAP 0/27359MB (cached 0MB) CPU [2%@729,0%@729,0%@729,6%@729,0%@729,0%@729,0%@729,0%@729,0%@1497,0%@1497,0%@1497,13%@1497] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[0,0] VIC_FREQ 921 APE 174 CV0@-256C CPU@52.656C Tboard@42C SOC2@49.125C Tdiode@42.5C SOC0@50.687C CV1@-256C GPU@-256C tj@52.562C SOC1@50.312C CV2@-256C VDD_GPU_SOC 2154mW/2154mW VDD_CPU_CV 718mW/718mW VIN_SYS_5V0 7862mW/7862mW NC 0mW/0mW VDDQ_VDD2_1V8AO 796mW/796mW NC 0mW/0mW

I’ve tried to verify it locally.
It seems matrixMul runs too fast so that tegrastats can’t sample it.
Please try using while 1 to run it in loop.
I could see both frequency for GR3D_FREQ and temperature for GPU up.

Apologies but I am a little bit lost. When you say to use “while 1” to run “it” in a loop, what is it that I should be running in a loop? gpu-burn? Or some other tool?