Hi CUDA team,
I’m curious whether the CUDA Driver API function cuLaunchKernel
is thread safe, specifically when called on the same CUfunction
concurrently from multiple (host) threads (in the same CUDA context).
CUfunction
is an alias for an opaque type: typedef CUfunc_st * CUfunction
, and the first argument to cuLaunchKernel
is such a pointer to non-const, which is not a good sign for thread safety. Indeed, this documentation for CUDA 10.1 CUDA Driver API :: CUDA Toolkit Documentation says “Calling cuLaunchKernel() sets persistent function state that is the same as function state set through the following deprecated APIs: cuFuncSetBlockShape(), cuFuncSetSharedSize(), cuParamSetSize(), cuParamSeti(), cuParamSetf(), cuParamSetv(). When the kernel f is launched via cuLaunchKernel(), the previous block shape, shared size and parameter info associated with f is overwritten.” Thus we’re told that the CUfunction
is modified in calls to cuLaunchKernel
. Without internal synchronization, we would thus have a data race and not be thread safe. This documentation does not mention thread safety.
However, an old doc from CUDA Toolkit 4 (page 7 of http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_4.0_Readiness_Tech_Brief.pdf ) says:
“For thread-safety, host threads launching kernels in the same context
concurrently must use the new thread-safe stateless launch API function
cuLaunchKernel(), which takes the place of the more verbose earlier API (i.e.,
cuParamSet*() + cuFuncSetBlockShape() + cuFuncSetSharedSize() + cuLaunchGrid()).
Note that with this new API, kernel grid launches in the CUDA Driver API more
closely resemble kernel launches via the <<<>>> syntax of the CUDA Runtime API.”
This seems to contradict the other doc by saying cuLaunchKernel
is stateless (though it is for an older version of CUDA).
So what’s the deal here? Is cuLaunchKernel
stateless or stateful? Is it safe to call concurrently from multiple host threads on the same CUfunction
(in the same CUDA context)? If it is thread safe and stateful, how does it achieve this?
Regards,
Thomas Peters