Multiple GPU computing

dilipreddi · April 24, 2008, 12:19am

HI I have two GPU’s on my system

can someone please let me know
how to accomplish the task of implementing
two matrix multiplications on both the GPU’s
at the same time(in parallel). External Image
is this possible to do two tasks separately on
the GPU’s at the same time. :rolleyes:

I have idea about cudasetdevice() but no idea on using both the GPU’s
at the same time.

Can someone please enlighten me on this issue.

Thank you

seibert · April 24, 2008, 12:56am

A CPU thread can only use one CUDA device, so to use multiple CUDA devices, you need to start multiple CPU threads, one for each CUDA device. Take a look at the simpleMultiGPU and MonteCarloMultiGPU projects in the SDK for examples.

DenisR · April 24, 2008, 5:43am

I believe this is not completely true. A GPU context can only live in 1 CPU thread, but you can control more than 1 GPU from a single CPU thread (I have not found any document stating otherwise yet). This is not optimal for simulations, where you want to have your CPU spinning for the GPU to finish its work, so you can immediately start the next job for the GPU, but for realtime applications, when using streams, you can from 1 CPU thread distribute your work across more than 1 GPU as far as I know. You just poll if one of the GPU’s is ready, and give this GPU the next piece of work to perform.

seibert · April 24, 2008, 10:52am

Really? Is this is demonstrated in one of the SDK examples?

MisterAnderson42 · April 24, 2008, 12:47pm

One thread is tied to one GPU context, as long as you aren’t using the 2.0 beta driver API context switching which is for sharing contexts among libraries and applications.

Each GPU context has it’s own protected memory space, and device pointers cannot be shared between them. The GPU that the context is assigned to is done using cudaSetDevice(). Once a host thread is associated with a context, it can’t “see” anything on the device outside of it’s little context. So, there is no way for a single host thread to control more than one GPU.

That isn’t to say that a single host thread controlling multiple GPUs wouldn’t be convenient. I will be performing this in my own code using worker threads and function delegates. I.e., once I write the code I will be able to do something like this in one thread:

gpu1->call(bind(cudaSetDevice, 0));

gpu2->call(bind(cudaSetDevice, 1));

gpu1->call(bind(cudaMalloc, &d_gpu1, other args));

gpu2->call(bind(cudaMalloc, &d_gpu2, other args));

gpu1->call(bind(cudaMemcpy, d_gpu1, other args));

gpu2->call(bind(cudaMemcpy, d_gpu2, other args));

gpu1->call(bind(runKerenel, d_gpu1, other args));

gpu2->call(bind(runKernel, d_gpu2, other args));

gpu1 and gpu2 are the worker threads. “call” just pushes the function call tied up by boost::bind onto a queue. The worker thread pulls the calls off the queue and calls them. It will be a bit heavy on the requirements (C++ host code and linked to the boost library), but as you can see the syntax is pretty slick, allowing for any function to be passed into the queue. Another upshot is that all call()'s will automatically be the equivalent of CUDA_SAFE_CALL, throwing an exception if an error is reported (in debug mode only).

If anyone is interested, the code will be open sourced once I write it.

Yes, there will be some overhead in calling functions with boost::bind and passing them to worker threads. However 1) My application targets a maximum of thousands of calls per second which shouldn’t be a problem (… I hope … will test). 2) If the GPU is kept busy ~100% of the time, much of the cost of queuing up the function delegates will just be overlapped with the GPU execution and effectively cost nothing in the end.

jimh · April 25, 2008, 3:42pm

Very slick, MisterAnderson. I would be interested in the code once it is finished.

DenisR · April 25, 2008, 5:24pm

Somehow I had the false impression that you could ‘bind’ streams to a certain GPU. So I was thinking of having 2 streams controlled from 1 thread, where each stream ran on 1 GPU. But reading the manual a bit better, I understand that this is not possible. I think I got this idea when looking at some stream example.

Anyhow, being able to bind a stream to a GPU, and through that mechanism controlling N GPU’s from 1 thread would be a very nice feature if possible :D

Chirality · April 25, 2008, 8:32pm

One thread is tied to one GPU context, as long as you aren’t using the 2.0 beta driver API context switching which is for sharing contexts among libraries and applications.
gpu1->call(bind(cudaSetDevice, 0));

gpu2->call(bind(cudaSetDevice, 1));

gpu1->call(bind(cudaMalloc, &d_gpu1, other args));

gpu2->call(bind(cudaMalloc, &d_gpu2, other args));

gpu1->call(bind(cudaMemcpy, d_gpu1, other args));

gpu2->call(bind(cudaMemcpy, d_gpu2, other args));

gpu1->call(bind(runKerenel, d_gpu1, other args));

gpu2->call(bind(runKernel, d_gpu2, other args));

I’ve done something similar–except that I’m apparently not a good enough C++ programmer to find a good solution to passing the arbitrary list of arguments into threads ( I was trying to use use va_lists from <stdarg.h>) and thus have a class interface like

gpu1.gpuInitialize(0);

float* d_a = gpu1.gpuMalloc(n);

float* d_b = gpu1.gpuMalloc(n);

gpu1.gpuAddVectors(d_a,d_b,n);

MisterAnderson42 · May 7, 2008, 7:11pm

Guess I’m not the only crazy C++ programmer here… Or is it “great minds think alike” ;)

My worker thread class is now complete, details and code links in this forum post http://forums.nvidia.com/index.php?showtopic=66598

Topic		Replies	Views
Is it possible using muliple context for a GPU. mulitple CPU thread CUDA Programming and Performance	10	4863	April 8, 2009
Managing multiple GPUs from a single host thread CUDA Programming and Performance	1	1209	October 10, 2010
My first test on CUDA and some questions sync, thread with CUDA CUDA Programming and Performance	5	3024	November 13, 2007
Multi-CPU process Multi-GPU CUDA Programming and Performance	2	1542	December 30, 2010
Multi-GPU with a single thread and driver API? CUDA Programming and Performance	5	4988	July 25, 2008
How to check work is done by different GPU in multi GPU environment CUDA Programming and Performance	8	3002	June 18, 2009
Multi-GPU - Some questions CUDA Programming and Performance	10	10737	January 21, 2010
streams in Multi-gpu system CUDA Programming and Performance	7	6040	May 23, 2017
Multiple Parallel GPUs CUDA Programming and Performance	4	2495	October 8, 2008
Threaded CUDA Multiple concurrent kernels? CUDA Programming and Performance	9	5594	October 20, 2009

Multiple GPU computing

Related topics