Hi,
I was looking into using –default-stream per-thread or equivalent #define CUDA_API_PER_THREAD_DEFAULT_STREAM for driver API calls (specifically FFMpeg cuvid decoder).
But couldn’t find any documentation about it.
The program loads the DLL dynamically - there is no call to nvcc in which I can add --default-stream per-thread and adding CUDA_API_PER_THREAD_DEFAULT_STREAM will not affect the loaded DLL functions.
Looking into cuda.h I’ve seen the following macro being used when CUDA_API_PER_THREAD_DEFAULT_STREAM is defined:
#if defined(__CUDA_API_VERSION_INTERNAL) || defined(CUDA_API_PER_THREAD_DEFAULT_STREAM)
#define __CUDA_API_PER_THREAD_DEFAULT_STREAM
#define __CUDA_API_PTDS(api) api ## _ptds
#define __CUDA_API_PTSZ(api) api ## _ptsz
And used in some of the APIs as follows:
#define cuMemcpyHtoD __CUDA_API_PTDS(cuMemcpyHtoD_v2)
...
#define cuMemcpy2D __CUDA_API_PTDS(cuMemcpy2D_v2)
...
#define cuStreamSynchronize __CUDA_API_PTSZ(cuStreamSynchronize)
Does it mean that by dynamic loading the ptds / ptsz versions of the APIs used in FFMpeg I would be able to achieve “default stream per thread” behaviour?