CUDA processor allocation

Wai_K_Wu · October 4, 2007, 3:15am

Hi all,

How does CUDA allocate its multiprocessors to different kernels started by multiple host threads? Say I have 2 host threads, and each one loads its own kernel for different tasks. We also know that the kennels only have 8 threads. Is CUDA smart enough to allocate different multiprocessors to different kennels?

AndreiB · October 4, 2007, 5:02am

In current CUDA version any running kernel occupies all available multiprocessors on card, it is not possible to run more than one kernel in parallel.
So if you have 2 host threads and each one starts its own kernel (I’m not sure you can actually ‘open’ same device from 2 different threads), these kernels WILL NOT run in parallel.

Wai_K_Wu · October 4, 2007, 7:51pm

Ouch. We were hoping could be done. A much more effective way to use the card.

alex_dubinsky · October 4, 2007, 10:21pm

yeah, it’d be cool if you could mix a bandwidth-intensive kernel and an arithmetic-intensive one, for example. Or to fire off many quick jobs but not lose efficiency. Or just for convenience. I’m sure the driver could in theory do it, there’s never any dependencies between blocks anyway.

Wai_K_Wu · October 5, 2007, 12:59pm

It looks like it can be done. At least with different users on the same system.

[url=“http://forums.nvidia.com/index.php?showtopic=46549”]http://forums.nvidia.com/index.php?showtopic=46549[/url]

wumpus · October 5, 2007, 1:05pm

The only way I know to make different multiprocessors do different things in parallel is to make the kernel check the blockIdx and do a different task depending on that. Like

if(blockIdx.x==0)

   doTaskOne();

else if(blockIdx.x==1)

   doTaskTwo();

else ...

But this won’t meet the ‘multiple host threads’ requirement I guess…

About that link: yes, simultaneous CUDA applications are possible, but I don’t think they will really run at once on the multiprocessors. There will be GPU context switching along with CPU context switching, and only one thread can do CUDA stuff at once.

Wai_K_Wu · October 5, 2007, 2:31pm

No, it won’t meet the ‘multiple host threads’ requirement. The point of having multiple host threads to launch different kernels is that the host can process the results in a timely fashion.

paulius · October 5, 2007, 11:10pm

Different kernel launches, be they from one or multiple host threads, are executed one at a time on the device. While intermingling different kernel launches might seem a like a good idea at first, a number of memory and synchronization issues creep up, bringing efficiency down.

In most cases, rethinking the parallelization approach helps. Perhaps you can have a 3rd host thread, which will be the only host thread communicating with the CUDA device. The other two host threads would then fill out the data structures and signal to the 3rd thread to launch CUDA kernels and memcopies.

Can you describe your application in more detail? If you don’t want to disclose details publicly, you can send me a message.

Paulius

Topic		Replies	Views
Multiple kernels in flight? CUDA Programming and Performance	19	27051	August 28, 2007
Threaded CUDA Multiple concurrent kernels? CUDA Programming and Performance	9	5714	October 20, 2009
A question the parallelization CUDA Programming and Performance	1	1212	July 28, 2008
Multiple concurrent device processes using multiple concurrent host threads CUDA Programming and Performance	4	3824	January 26, 2009
Multiple threads using single Tesla CUDA Programming and Performance	3	3817	March 27, 2009
Concurrent Kernel Execution CUDA Programming and Performance	2	4575	June 10, 2011
Run different kernel functions on different Multiprocessors simultaneously Is it possible to assign CUDA Programming and Performance	3	2958	December 24, 2009
Scheduling Blocks on a Multi-Processor Block Scheduling on Multiprocessor CUDA Programming and Performance	11	6500	December 6, 2007
A question the parallelization CUDA Programming and Performance	5	2760	July 29, 2008
send two kernels at one time what will happen? CUDA Programming and Performance	9	3715	July 4, 2008

CUDA processor allocation

Related topics