Can't wakeup from deep sleep in cuda function

Our board and system is as below.

  • NVIDIA Jetson Xavier NX (Developer Kit Version)
    • Jetpack 4.6 [L4T 32.6.1]
    • NV Power Mode: MODE_15W_6CORE - Type: 2
    • jetson_stats.service: active
  • Board info:
    • Type: Xavier NX (Developer Kit Version)
    • SOC Family: tegra194 - ID:25
    • Module: P3668 - Board: P3509-000
    • Code Name: jakku
    • CUDA GPU architecture (ARCH_BIN): 7.2
    • Serial Number: 1424920027702
  • Libraries:
    • CUDA: 10.2.300
    • cuDNN: 8.2.1.32
    • TensorRT: 8.0.1.6
    • Visionworks: 1.6.0.501
    • OpenCV: 4.5.1 compiled CUDA: YES
    • VPI: ii libnvvpi1 1.1.12 arm64 NVIDIA Vision Programming Interface library
    • Vulkan: 1.2.70

We got issue that when the board was waked from deep sleep, the cuda function is not resume. Please see below gdb thread call stack. That thread is alwasy blocked in the cuda function and can’t resume from deep sleep.

We used “sudo systemctl suspend” to go into deep sleep.

Thread 7 (Thread 0x7f6c903960 (LWP 18175)):
#0 0x0000007f9158d538 in ioctl ()
at …/sysdeps/unix/sysv/linux/aarch64/ioctl.S:25
#1 0x0000007f80a85758 in ()
at /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so
#2 0x0000007f80a88d68 in NvRmHost1xSyncpointWait ()
at /usr/lib/aarch64-linux-gnu/tegra/libnvrm_graphics.so
#3 0x0000007f29e7a774 in () at /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#4 0x0000007f29cfb820 in () at /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#5 0x0000007f29e3f6f0 in () at /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#6 0x0000007f29d7c2e8 in () at /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
—Type to continue, or q to quit—
#7 0x0000007f29d7c440 in () at /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8 0x0000007f29cedd18 in () at /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#9 0x0000007f29def5d0 in cuStreamSynchronize ()
at /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#10 0x0000007f92aac38c in cudart::cudaApiStreamSynchronize(CUstream_st*) ()
at /mnt/ssd/github/saturn_acam/…/…/…/…/usr/local/lib/libAcamBeamforming.so
#11 0x0000007f92aeb1e0 in cudaStreamSynchronize ()
at /mnt/ssd/github/saturn_acam/…/…/…/…/usr/local/lib/libAcamBeamforming.so
#12 0x0000007f92a99f8c in AcamCalcBeamforming(_handle*, int, int) ()
at /mnt/ssd/github/saturn_acam/…/…/…/…/usr/local/lib/libAcamBeamforming.so
#13 0x0000007f92a9e4b4 in BMThreadRun(void*) ()
at /mnt/ssd/github/saturn_acam/…/…/…/…/usr/local/lib/libAcamBeamforming.so
#14 0x0000007f91896088 in start_thread (arg=0x7fcaff1fdf)

The system can be waked up from the deep sleep, just that thread can’t be resumed and always is blocked there.

Hi,

Could you share more details about your use case?

Please noted that the GPU task is launched through the CPU calls.
It will be better if you wait for the GPU task to finish and relaunch a new one when the system wakeup.

Thanks.

Our case is we run a acoustic program which will use cuda API to do some math calculation and show the acoustic hot map on screen. When the user presses power key, we will make the device go into deep sleep mode. When the user presses the power key again, we will wake up from the deep sleep mode.

We found when the user wakes up from deep sleep, the acoustic hot map is freezed, after we debugged this issue, we found it is because that thread is blocked in cuda function and can’t be resumed to run.

For your comments, “It will be better if you wait for the GPU task to finish and relaunch a new one when the system wakeup.” can you provide more detailed information?

" wait for the GPU task to finish ", when should we wait for GPU task to finish, before deep sleep?
“relaunch a new one when the system wakeup”, the wakeup from deep sleep is HW, from software side, it just continue to run from the sleep point. How can we relaunch a new one from wakeup?

Hi,

Do you feed signals into the CUDA function continuously?

If yes, when the user presses the power button, please stop triggering the CUDA kernel and go to sleep after all the kernel job is done.
If the kernel takes some time to finish, you might need to terminate it manually.

Is this possible for your use case?

Thanks.

We are trying to implement this solution:

  1. when user presses power key to go into sleep, we stop the acoustic thread to make sure it won’t call any cuda functions, then call systemctl suspend
  2. When user presses power key again to wake up from deep sleep, we check and re-start acoustic thread.

We will test it and tell you the result.

Thanks

Have you managed to get issue resolved or still need the support? Thanks

We stop the CUDA work before sleep, then start CUDA work after wakeup. Then we have no this issue.