Hello
I would like to use Cuda kernel in different thread.
Is threre any problem to use cuda function in different std::thread?
Should I have trouble if I use a cuda allocation or copy memory from different thread?
Hello
I would like to use Cuda kernel in different thread.
Is threre any problem to use cuda function in different std::thread?
Should I have trouble if I use a cuda allocation or copy memory from different thread?
You don’t have to actually a separate thread. The CUDA calls are non-blocking. So you can just call a kernel and then it’ll execute in the background while your current thread continues executing.
Edit:
Realized I didn’t actually answer your question :P
No, there’s no added danger in calling a CUDA kernel from a separate thread. And because threads in a process share the same memory space, it’s even safe to allocate in one thread and then simply pass the pointer to the allocation around by value. Mind you, you’ll have to manually manage the lifetime of the allocation yourself so this isn’t particularly recommended.
Remember, CUDA is a C++ API. C++ supports RAII and that’s a paradigm I wholeheartedly endorse instead of passing around raw pointers and manually releasing the resources yourself.
Thank you for the answer. I know that when I use cudaMemcpy in different there I had to worried about the inter process copy. So I had to use asynchronous cudaMemcpy to avoid that.
But when I only do kernel processing at the same time from different cpu thread do I have to use asynchronous launch of kernel with stream or should this be all right?
I just notice a strange think.I made two program. The first with multiple thread which run the same kernel function and the memory allocation and copy on cuda. The second is the same program execute only in one thread.
I run the first program versus I run the second program multiple time (I run the function ./myprogram on different terminal and they run parallely).
The second version seems to work properly with no conflict between memory allocation and copy, the first go wrong. Both of the execution type should go wrong? why the second seems to work properly?
You should definitely post a small example of what’s going on just so we can all take a look at it.