How to kill all the executions on Nvidia Jetson AGX Orin in Linux kernel directly?

Are there any commands CPU can send to GPU from kernel directly to kill all the computations on the GPU?

Hi,

Could you share more about the use case?

You may try if cudaDeviceReset() can meet your requirement:

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1gef69dd5c6d0206c2b8d099abac61f217

Thanks.

Thank you for your response. I’m interested in implementing a Linux kernel function that can instantly terminate a specific GPU task. The cudaDeviceReset() is a user-space function, and it necessitates the invocation of a GPU driver. Ultimately, the GPU driver dispatches certain commands to the GPU to halt the computation. I would greatly appreciate any details you could provide about kernel-level implementations.

Hi,

We need to check this with our internal team.
Will let you know later.

Thanks.

Hi,

We have got some info from our internal team.

We don’t have a public API that the CPU can send to GPU from the kernel to kill all the computation.
But if you want to force recover a specific TSG/channel, please check NVGPU_IOCTL_CHANNEL_FORCE_RESET.

Thanks.

Thank you for your detailed explanation. From what I gather, once a computation command is dispatched to the GPU, it cannot be interrupted until it has fully executed, correct? Are there any ways to reset the GPU to stop the computation?

Will NVGPU_IOCTL_CHANNEL_FORCE_RESET stop the ongoing computations that have been submitted through the channel?

Hi,

YES, it reset the channel directly.
It will terminate the all the works on that channel.

Thanks.

1 Like

Thank you so much!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.