From Linux kernel perspective, is there any signal GPU gives to CPU that they have finished a kernel execution?

jay_hunter · June 27, 2023, 3:35pm

Without referring to user space, I’m curious whether the GPU will signal the CPU proactively, in a manner that the Linux kernel can handle, once it completes a kernel computation.

linuxdev · June 27, 2023, 5:36pm

I don’t know if this is what you’re really asking, but the GPU should generate a hardware IRQ at certain points. Most likely at the point you are asking about a hardware IRQ is generated. That IRQ is not likely to be specific to completion of a computation; many computations might be completed before the IRQ.

jay_hunter · June 28, 2023, 7:14pm

Thank you for your response. I would like to confirm whether the GPU signals the completion of a kernel computation using an IRQ. Additionally, does the GPU send the IRQ upon the completion of the kernel computation itself, or after the kernel computation’s output has been transferred from the GPU task memory space to the CPU process memory space?

linuxdev · June 29, 2023, 7:28pm

Someone from NVIDIA would have to confirm that. Any transfer of memory though from one hardware (GPU) to another (CPU) would result in at least one hardware IRQ. Details would require going into the source code of either (A) the driver being invoked or left due to IRQ, and (B) the scheduler policies. That’s such a general thing I don’t think I can give you a useful answer.

AastaLLL · July 5, 2023, 6:27am

Hi,

Could you share more about your use case?
In general, the CPU waits for the GPU task done with a synchronization call.

Thanks.

jay_hunter · July 10, 2023, 3:53pm

Thank you for your response. I aim to implement a function within the Linux kernel that monitors the completion status of a specified GPU kernel. Therefore, I’m currently exploring which task completion signals could be used to reliably determine the end of a task.

AastaLLL · July 11, 2023, 6:26am

Hi,

We need to check with our internal team.
Will update more information with you later.

Thanks.

AastaLLL · July 12, 2023, 2:27am

Hi,

Here is the info from our internal team.

It’s possible to force recover a TSG, but that is not terminating a specific task.

Since we do not track launches in kernel space, we cannot terminate a particular launch.
Nor is there an API to do such a thing nor is that even conceptually possible because one does not kill tasks per-se, rather you can reset an engine, running a particular task (defined here as a discrete batch of work submitted to the GPU, not a TSG).

If you can provide more info about the intended use case, it may be possible to help.

Thanks.

jay_hunter · July 12, 2023, 2:43am

Thank you for your response. But your response seems to not answer my this question? My question is which task completion signals could be used to reliably determine the end of a task in Linux kernel.

AastaLLL · July 13, 2023, 4:17am

Hi,

We are still waiting for the answer to this question.
In the meantime, could you provide more info about the intended use case?

Thanks.

jay_hunter · July 15, 2023, 8:56pm

Thank you for your comprehensive response. My specific requirement involves a Linux kernel mechanism designed to identify and handle a malicious GPU kernel that is unjustly monopolizing GPU resources. My objective is to send software signals to the GPU kernel and upon determining whether it does not complete promptly (at this stage, I require a dependable signal to confirm the kernel’s termination). If it does not complete after received software signal, I need to forcefully terminate it to prevent potential misuse of resources(here I need a way to force terminate the malicious GPU kernel).

linuxdev · July 16, 2023, 7:59pm

If it were user space, this would be “kill -9”. There are zombie processes and other places in the kernel where some intervention is needed to get to what is there, but there is a distinction between a missing process which is still scheduled, versus a process which is not responding, and is still scheduled. I’m kind of mumbling here because it is an interesting problem. In most cases it is the scheduler which determines when a process shifts out of context or not, and cleanup through the scheduler. I guess I’m an odd person, because now you have me wondering if the GPU itself has some form of hardware-based scheduler or internal method of doing the equivalent of a scheduler and accepting kill commands to force a process (or thread) out of context.

AastaLLL · July 28, 2023, 3:48am

Hi,

Here is the suggestion after our internal discussion:

Please build monitoring and closing of the channels via CUDA APIs.
Is there any limitation and why do you need to talk to Kernel?

Thanks.

system · August 23, 2023, 1:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPU communication with the CPU CUDA Programming and Performance	3	3534	March 9, 2009
How does the CPU know when a GPU kernel finishes? CUDA Programming and Performance	4	2326	September 25, 2014
How do CPU threads know that GPU kernel is finished? CUDA Programming and Performance	1	3460	August 29, 2018
Kernel Runtime CUDA Programming and Performance	9	5698	July 9, 2008
Allow kernel to wait for completion of gpu code CUDA Programming and Performance	1	2208	August 19, 2009
How to kill all the executions on Nvidia Jetson AGX Orin in Linux kernel directly? Jetson AGX Orin hw , kernel	9	624	August 16, 2023
Full control of the CPU cores possible? Jetson Orin Nano kernel	5	1063	April 20, 2023
Does CPU process wait when calling CUDA code CUDA Programming and Performance	4	3279	October 8, 2017
What are possible reasons of heavy kernel launch latency? CUDA Programming and Performance cuda , kernel , python	12	933	April 15, 2025
How to free CPU after assigning task to GPU. Jetson TK1	3	1191	May 12, 2015

From Linux kernel perspective, is there any signal GPU gives to CPU that they have finished a kernel execution?

Related topics