Does CPU process wait when calling CUDA code

yma1 · October 6, 2017, 7:12pm

I am curious that what’s the process status if the process calls CUDA code. Suppose a process composing 3 parts: part 1: CPU code; part 2: GPU code; part 3: CPU code. If the process is in the CUDA code, for the CPU part, how is the process status? Can this process be preempted, or the CPU is held by the process and waiting for the CUDA return?

Also, if I have two CUDA processes and use round robin to schedule them. How about if a process’s time slot is finished and it executes code in GPU? Does this process release the CPU or still hole the CPU? How the GPU notify that the GPU work is done? By interrupt? Thanks.

Robert_Crovella · October 8, 2017, 2:49pm

CUDA kernel launches are asynchronous. This means that the CPU thread initiating the kernel launch makes a call into a library which starts the GPU processing. This library routine returns control to the CPU thread before the kernel has actually begun executing. The CPU thread can continue processing your code at that point (any code you have written after the point of the kernel launch), while the GPU kernel is executing.

Two or more CPU processes can share a GPU in default compute mode, through a mechanism known as context-switching. A description of a GPU context is given in the programming guide:

[url]Programming Guide :: CUDA Toolkit Documentation

It is, roughly speaking, the GPU state associated with a CPU process that is using the GPU. Two separate processes will usually have two separate contexts, if they are using the same GPU.

The detailed behavior of context switching is not specified anywhere that I know of, but a general rule is that while one (or more) kernel(s) is executing from a particular process, no kernels from any other processes may be executing. When the kernel(s) from the process finishes/terminates, then the GPU, may, at its unspecified discretion, choose to process additional work from the same process/context (e.g. more kernel launches, perhaps) or it may choose to context-switch and service work requests from other processes.

Again, I know of no concise, unified specification for GPU context-switching that answers detailed questions such as how and under what circumstances a context-switch will occur.

Normally, when using the CUDA runtime API, a GPU context is destroyed when the CPU process owning it terminates. Context destruction should result in automatic release of any resources (e.g. GPU memory allocations) still owned by that context.

yma1 · October 8, 2017, 7:13pm

Thanks for your reply. Can I understand that GPU work like a keyboard that if a kernel finishes and the host CPU process is in waiting (not has the CPU), the CPU raises an interrupt to wake the hosts CPU process up?

CUDA kernel launches are asynchronous. This means that the CPU thread initiating the kernel launch makes a call into a library which starts the GPU processing. This library routine returns control to the CPU thread before the kernel has actually begun executing. The CPU thread can continue processing your code at that point (any code you have written after the point of the kernel launch), while the GPU kernel is executing.

Two or more CPU processes can share a GPU in default compute mode, through a mechanism known as context-switching. A description of a GPU context is given in the programming guide:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#context

It is, roughly speaking, the GPU state associated with a CPU process that is using the GPU. Two separate processes will usually have two separate contexts, if they are using the same GPU.

The detailed behavior of context switching is not specified anywhere that I know of, but a general rule is that while one (or more) kernel(s) is executing from a particular process, no kernels from any other processes may be executing. When the kernel(s) from the process finishes/terminates, then the GPU, may, at its unspecified discretion, choose to process additional work from the same process/context (e.g. more kernel launches, perhaps) or it may choose to context-switch and service work requests from other processes.

Again, I know of no concise, unified specification for GPU context-switching that answers detailed questions such as how and under what circumstances a context-switch will occur.

Normally, when using the CUDA runtime API, a GPU context is destroyed when the CPU process owning it terminates. Context destruction should result in automatic release of any resources (e.g. GPU memory allocations) still owned by that context.

Robert_Crovella · October 8, 2017, 7:31pm

That sort of low-level description of how the hardware interacts with its driver is not documented anywhere that I know of.

From a programmer’s perspective, it should be sufficient for most cases I can imagine, simply to acknowledge that the GPU and driver have communication paths between them, and somehow the driver keeps track of the GPU state, and knows when to issue new work.

The only time a host CPU process would be waiting on the GPU is if it encountered a synchronization point, such as a call to cudaDeviceSynchronize() or cudaMemcpy(), to pick two possible examples. Somehow, the CPU thread “waits” on the GPU/GPU driver at these points, and somehow the driver allows the host thread to continue when ready. Since these points involve calls into the CUDA runtime library, I would think a sufficient mental model is that the relevant library routine does not return until the condition is satisfied.

The programmer has some control over the library and machine behavior at these thread-blocking points, whether it is a spin-wait type of behavior, or a yield behavior, via CUDA runtime API calls which modify this CPU thread blocking behavior:

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g69e73c7dda3fc05306ae7c811a690fac

I’d encourage you to read the cudaSetDeviceFlags call described at the above link.

yma1 · October 8, 2017, 7:58pm

Enn. Thanks for your reply. I think I already have the answer.

Can I understand that GPU work like a keyboard that if a kernel finishes and the host CPU process is in waiting (not has the CPU), the CPU raises an interrupt to wake the hosts CPU process up?

That sort of low-level description of how the hardware interacts with its driver is not documented anywhere that I know of.

From a programmer’s perspective, it should be sufficient for most cases I can imagine, simply to acknowledge that the GPU and driver have communication paths between them, and somehow the driver keeps track of the GPU state, and knows when to issue new work.

The only time a host CPU process would be waiting on the GPU is if it encountered a synchronization point, such as a call to cudaDeviceSynchronize() or cudaMemcpy(), to pick two possible examples. Somehow, the CPU thread “waits” on the GPU/GPU driver at these points, and somehow the driver allows the host thread to continue when ready. Since these points involve calls into the CUDA runtime library, I would think a sufficient mental model is that the relevant library routine does not return until the condition is satisfied.

The programmer has some control over the library and machine behavior at these thread-blocking points, whether it is a spin-wait type of behavior, or a yield behavior, via CUDA runtime API calls which modify this CPU thread blocking behavior:

http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g69e73c7dda3fc05306ae7c811a690fac

I’d encourage you to read the cudaSetDeviceFlags call described at the above link.

Topic		Replies	Views
GPU sharing among different application with different CUDA context CUDA Programming and Performance	23	18964	December 17, 2020
How is the laptop GPU able to do the rendering and execute a cuda program at the same time CUDA Programming and Performance	6	964	August 15, 2023
GPU Context switch of multiple processes CUDA Programming and Performance	8	4339	February 24, 2021
cuda host device Question CUDA Programming and Performance	3	6457	December 1, 2010
Kernel Runtime CUDA Programming and Performance	9	5811	July 9, 2008
how nvml tell there is "running process" on certain GPU card? CUDA Programming and Performance	4	1451	May 5, 2015
Question about interoperability of CUDA Graphs Green Context across multiple processes CUDA Programming and Performance cuda	4	301	May 14, 2025
Concurrent Kernel executions Concurrent Kernel executions on same CPU thread and multiple CPU threa CUDA Programming and Performance	2	4233	August 25, 2011
unable to get the cpu and gpu to run in parallel CUDA Programming and Performance	34	23620	October 7, 2010
GPU and CPU don't run in (pure) parallel ? CUDA Programming and Performance	24	20478	May 4, 2007

Does CPU process wait when calling CUDA code

Related topics