Multi-thread and multi-card support How does CUDA handle multiple CPUs/GPUs?

I have a couple of simple questions about support for multi-threading in CUDA, which I’m looking at using as a compute farm for financial analytics:

  • Does CUDA support multiple cards? (the docs imply it does, but i havent seen any examples or articles about this)
  • If this is the case, does CUDA show SLI cards as single or multiple devices?
  • Are the CUDA libraries and drivers thread-safe i.e. could I run multiple CUDA algorithms from separate threads to the same card? (docs suggest that running separate threads for separate GPU devices is necessary, but there’s no mention of what happens if you point multiple threads to the same device)

And, as an aside, does anyone know of any hardware vendor that sells a workstation that would support dual Quad-core Intel + dual 8800 cards?


As the manual says, CUDA support multiple cards. SLI makes no sense out of the graphics rendering scope, so it doesn’t matter whether the cards are SLI connected or not, CUDA provide access to them as separated computing devices. You can activate one or another through API calls (cudaSetDevice()). It is all explained in the manual, read the appendix D carefully ;)

Ahah… there is no Appendix D in my copy (0.8)… I’ll download the latest.



Also, there is a multiGPU sample in the CUDA SDK.
You may also have to disable the SLI mode in the driver to use two or more cards for CUDA computing.


hi, what about the last question, “could I run multiple CUDA algorithms from separate threads to the same card”?

I am searching around for this problem, because my programming running multiple CUDA algorithms from separate threads crashed. Thanks.

Nobody has answered this question yet:

Yes, you can access the same card with multiple threads, but why would you want to? Two CUDA programs competing for resources on the card will inevitably run slower than running the two programs sequentially. The use of a job queuing system is needed here.

Separate cards must be accessed by separate threads (or processes) with the appropriate use of cudaSetDevice as mentioned by others already.

The fact that there’s only one kernel execute at a time inside CUDA make it non sense to run multithread on the same CUDA context.

I agree with Linh Ha. How two kernels with two different grid size and block size parameters can be ran simutaneously ?

Aren’t the kernels queued CPU threads are trying to acess the card ?

They don’t run simultaneously. The closest you can get (if the card supports it) is to write your host code so that it copies data and executes the kernel using the async calls, which allows one kernel to execute while data is being copied to the card for the next one in the queue.

I don’t personally happen to have any application I want to run that way - but the above statement is like saying that ‘because a single core GPU can only execute one instruction at a time, it is nonsense to run multiple threads on it’.