Deep sleep on Jetpack 4.3

We recently switched from Jetpack 3.3 to Jetpack 4.3 in order to use a vendor’s camera driver (Leopard IMX415). However, we’re hitting an issue with SC7 sleep mode - in Jetpack 3.3, the TX2 could go to sleep when the GPU was still in use. However, that is no longer possible in Jetpack 4.3 - specifically, the error is:

[12776.228368] dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -16
[12776.235099] PM: Device 17000000.gp10b failed to suspend: error -16

The steps we are doing to get to this point are:

  1. Load a pytorch model on GPU, and just wait indefinitely, waiting for images
  2. Call “sudo systemctl suspend” to put the system to sleep in a separate window/thread
  3. Observe if the system stays asleep
  4. See that it does not stay asleep, check dmesg for error messages.

Is there anything we’re missing here to not be able to keep the TX2 asleep while using the GPU?

Without problem without running phtorch model, right?

That is correct.

Does pytorch running any camera software while going to suspend?

Yeah, this is because tx2 has joint rail and cuda is not allowing the system to sleep looing into joint rail property.Its the same internal issue reported

@ShaneCCC, no, we already know that when the camera is on the system cannot go to sleep - that is ok with us. We would just like the models to be loaded on the GPU so we do not have to wait for the models to load again when waking up the TX2.

@Bibek, is there a roadmap to fix this issue? It’s interesting that Jetpack 3.3 had this working, but Jetpack 4.3 does not.

yes, we are working on it.
as of now, you can remove this patch from kernel to get back to older behavior
Remove the below highlighted nvidia,tegra-joint_xpu_rail; property from the dts file

chosen {
bootargs =“console=ttyTCU0,115200”;
board-has-eeprom;
nvidia,tegra-joint_xpu_rail;
};

Thanks @Bibek! Might I ask what the joint xpu rail does?

in TX2, CPU and GPU are on same OVR regulator. So, the power rail is joint. We added this property for nvgpu to not turn off the GPU engines( to save leakage power) since the rail is always on till both cpu and gpu are on.
But due to this property, cuda does not ask for gpu power gating. so nvgpu is busy and thus suspend fails

Unfortunately, it appears that I am still running into this issue sometimes - most of the time the system does go to sleep successfully, but there are times when it still does not. What else could be the issue?

you need to add another change.
in the target
File:
etc/systemd/nv.sh

add this change, only the lines in bold
if [ “$SOCFAMILY” = “tegra186” ]; then
if [ -d “/sys/kernel/debug/bpmp/debug/clk/nafll_tsec/” ]; then
echo 1 > /sys/kernel/debug/bpmp/debug/clk/nafll_tsec/state
cat /sys/kernel/debug/bpmp/debug/clk/nafll_tsec/min_rate >
/sys/kernel/debug/bpmp/debug/clk/nafll_tsec/rate
fi

if [ -f “/sys/devices/17000000.gp10b/railgate_enable” ]; then
echo 0 > /sys/devices/17000000.gp10b/railgate_enable
fi
fi

@Bibek, it looks like setting it to 0 makes it unable to go to sleep, but setting it to 1 allows it to. Is that what you intended? Otherwise I may be doing something wrong.

Ignore the last comment. Can you share the log when its failing with the joint-rail property removed?

It hasn’t seemed to fail yet, so I’ll keep you posted if/when it does.