Hello
I have a basic question about multiple CPU threads calling a CUDA kernel.
Does the device memory have to be allocated from the thread that is calling the CUDA kernel of can I allocate it from the main thread?
I am able to get correct results only when I allocate device memory from the CPU thread that calls the kernel.
Please let me know if this is unclear and I can post some sample code.
Thanks!
3.2 (unfortunately).
I started on this project before 4.0 was officially released.
Just to clarify, as a test I made all my host variables global, so all threads in the process should have access to them.
yeah, with 3.2, there’s a CUDA context-thread mapping that you can control with the driver API. alternately, in 4.0, things just sort of work as you would expect from the runtime
OK. My (rather rudimentary) understanding of GPU contexts is in relation to multiple GPUs on one host.
I have only one GPU on my host. Does it still apply?
Thanks!
usually that’s true, but in the runtime API pre-4.0, there’s a one-to-one correspondence between CPU threads and GPU contexts. so, if you create two threads in your application, both threads will end up with their own context on the same GPU.
Thank you very much for taking the time to answer my questions.
Very useful. I will read more about GPU contexts…or maybeits time to switch over to 4.0 :)