Do all CUDA runtime kernels call cuLaunchKernel( or cuLaunchKernel_ptsz and so on) directly or indirectly?

lvcunchi · October 24, 2023, 3:23am

I want to know “Do all CUDA runtime kernels call cuLaunchKernel( or cuLaunchKernel_ptsz and so on) directly or indirectly?” Expecting your answers.

Robert_Crovella · October 24, 2023, 4:18pm

The exact API has varied over time, but yes, when you use this syntax:

my_kernel<<<...>>>(...);

to launch a kernel, the nvcc compiler will generally translate that non-standard C++ code into a sequence of one or more library calls, one of which is cudaLaunchKernel or similar variant for the runtime API, and the runtime API may under the hood invoke the driver API, and cuLaunchKernel is a driver API variant in this family of kernel launch APIs.

The exact specifics (exact API/function names) as well as exact mechanism (whether and how the runtime API makes use of the driver API) have varied over time, i.e. from one CUDA version to the next, and there is no guarantee that there won’t be variation in the future. None of this is specified; it is considered an implementation detail; so applications or functionality that depend on these specfics may break from one CUDA version to the next, without prior notice.

lvcunchi · October 25, 2023, 1:43am

Thanks for your detailed response, Robert. Further, given a model training or inference Python script written by Pytorch or Tensorflow, we know the backends of these frameworks will call runtime API e.g. cublasGemm from certain advanced libraries, such as libcudnn.so, libcublas.so, libcufft.so and so on. Due to closed sources, I am not sure whether these libraries’ APIs call detailed kernels in this syntax, i.e. my_kernel<<<…>>>(…); Can you clarify it? Thanks very much.

Robert_Crovella · October 25, 2023, 2:05pm

No, sorry, I can’t release internal details of closed source library code.

Topic		Replies	Views
Clarification on `cudaLaunchKernel` Count in `nsys stats -r cuda_api_gpu_sum Profiling Linux Targets	5	121	August 1, 2025
cudaLaunchKernel failed to launch kernel CUDA Programming and Performance cuda	2	1802	April 19, 2022
Does runtime API will call drive API? CUDA Programming and Performance	2	226	April 11, 2024
Calling Kernels from .cpp files ? CUDA Programming and Performance	3	3461	June 5, 2010
Limitations of CUDART C++ API kernel execution control functionsis limited to Is there a way to laun CUDA Programming and Performance	0	2657	August 11, 2011
Is the CUDA tile kernel submitted to GPU still using the cuLaunchKernel? CUDA Programming and Performance	2	154	December 12, 2025
Launching Kernels with Runtime Calls Instead of <<< >>> Notation? CUDA Programming and Performance	2	1035	April 2, 2011
'CUDA runtime API' term explain meaning of the term in order to translate into other languag CUDA Programming and Performance	4	1720	December 25, 2009
Dose anyone knows the difference between Kernel<<<G,B>>>(){} and the Function cudaLaunchKernel() CUDA Setup and Installation	2	625	December 13, 2017
Separate Kernel Compilation? CUDA Programming and Performance	3	836	December 2, 2010

Do all CUDA runtime kernels call cuLaunchKernel( or cuLaunchKernel_ptsz and so on) directly or indirectly?

Related topics