Do all CUDA runtime kernels call cuLaunchKernel( or cuLaunchKernel_ptsz and so on) directly or indirectly?

I want to know “Do all CUDA runtime kernels call cuLaunchKernel( or cuLaunchKernel_ptsz and so on) directly or indirectly?” Expecting your answers.

The exact API has varied over time, but yes, when you use this syntax:

my_kernel<<<...>>>(...); 

to launch a kernel, the nvcc compiler will generally translate that non-standard C++ code into a sequence of one or more library calls, one of which is cudaLaunchKernel or similar variant for the runtime API, and the runtime API may under the hood invoke the driver API, and cuLaunchKernel is a driver API variant in this family of kernel launch APIs.

The exact specifics (exact API/function names) as well as exact mechanism (whether and how the runtime API makes use of the driver API) have varied over time, i.e. from one CUDA version to the next, and there is no guarantee that there won’t be variation in the future. None of this is specified; it is considered an implementation detail; so applications or functionality that depend on these specfics may break from one CUDA version to the next, without prior notice.

1 Like

Thanks for your detailed response, Robert. Further, given a model training or inference Python script written by Pytorch or Tensorflow, we know the backends of these frameworks will call runtime API e.g. cublasGemm from certain advanced libraries, such as libcudnn.so, libcublas.so, libcufft.so and so on. Due to closed sources, I am not sure whether these libraries’ APIs call detailed kernels in this syntax, i.e. my_kernel<<<…>>>(…); Can you clarify it? Thanks very much.

No, sorry, I can’t release internal details of closed source library code.