I’m processing the several requests in parallel in my PC.
The procedure is implemented as a cuda kernel.
Is it possible to run a cuda kernel on several cpu threads? and How it works? and what about performance?
Thanks in advance.
Yes, it’s possible. The CUDA OpenMP sample code gives one example:
This happens to be launching kernels on separate GPUs, but it’s not difficult to modify the code to run on a single GPU. (You may also want to look at the cuda concurrent kernels sample code, which is not multi-threaded, but demonstrates running multiple concurrent kernels on the same GPU.)
The simple Multi-GPU sample may also be of interest:
I want to know how performance would be got in two cases either.
That is - duration of n times execution of a kernel vs duration of execution a kernel on n cpu threads.
I will be appreciated to be answered from you about this, thanks
PS: duration of execution of a kernel by n cuda streams, either.