Multiple threads using single Tesla

PeterW · March 27, 2009, 8:31am

Hello,

I am considering the following scenario: An interrupt service routine locked to one core on a host CPU is in charge of transferring data from a PCIe DAQ card at 60kHz into a ring buffer on a single Tesla device memory (either by copying into RAM first followed by a hostToDevice transfer, or, preferrably through some DMA magic). Two application threads are running on the host, each locked to one CPU core, and each working on different aspects of the same global ring buffer in the single Tesla. Is it possible that each thread launches kernel functions in parallel, targeted of course to different GPUs on the Tesla, e.g. thread1 uses GPU 0-63, thread2 uses GPU 64-127, etc., i.e. the threads share a Tesla using only portions of it ?

Hints and suggestions are highly appreciated,
peter

seibert · March 27, 2009, 1:57pm

Are you talking about a single Tesla card, like the C1060? As far as CUDA is concerned, that is one monolithic device, and only one kernel can run at a time on it. You cannot partition the stream processors into subsets and run different kernels on each subset.

If you are talking about the S1070, that is actually four C1060 cards in a rackmount case, and appears to the driver as four separate devices. Each of those devices can be used independently.

PeterW · March 27, 2009, 2:55pm

Hello,

Thanks for the clarification. Yes, implicitely I was thinking about a C1060 card. So it means that I would need a Tesla device for every host thread asking kernels to be run on that device.

Thanks again,
peter

seibert · March 27, 2009, 5:59pm

It doesn’t sound like this applies to your situation, but you can have two threads (or two different processes) use the same CUDA device. The kernel calls will be time-sliced, though, which will increase latency and might have no benefit if your usage of the GPU is near 100% of the time with a single process.

I bring this up because I recently noticed that the efficiency cost of having a CUDA device constantly switching between two processes is much lower than I remember. (Hadn’t checked this since pre-1.0.) I have a program with a GPU duty cycle of about 30%, and was pleasantly surprised to see that two processes could share the GPU with negligible slow-down compared to each process running alone.

Topic		Replies	Views
How to run single job using several Tesla 1060 simultaneously? CUDA Programming and Performance	1	4343	January 9, 2010
Executing kernel from different host threads CUDA Programming and Performance	1	1782	September 1, 2011
Gpu and multiple processes CUDA Programming and Performance	6	1717	September 16, 2010
Basic Question (over CUDA concepts) CUDA Programming and Performance	0	1994	April 23, 2010
CUDA processor allocation CUDA Programming and Performance	7	3435	October 5, 2007
cuda with multicore (multitasking) multicore CPU(for multitasking) and CUDA CUDA Programming and Performance	13	12028	February 23, 2009
Multi-user-systems und multi-gpu-usage CUDA Programming and Performance	9	6205	July 15, 2008
Threaded CUDA Multiple concurrent kernels? CUDA Programming and Performance	9	5594	October 20, 2009
Limitations on using GPU with a multi-thread program CUDA Programming and Performance	1	5278	October 10, 2011
Programming Multiple GPUs CUDA Programming and Performance	2	982	April 29, 2010

Multiple threads using single Tesla

Related topics