putting multiprocessors in group

3dward · November 25, 2009, 3:30pm

Anyone has thought about the possibility to put multiprocessors in group and run different tasks in each group of cores. This means to run the applications ( and so the kernels concurrently). For example, executables would only pick 16 cores out of 240 to run, so we can run several concurrently.

Context switching may help to utilize the GPUs better, but what i know is that only one kernel and one task could be done each time slice no matter how many cores you have.

Not sure if my idea make sense under the current CUDA architecture. In next generation things may be possible???http://www.nvidia.com/object/fermi_architecture.html

avidday · November 25, 2009, 3:55pm

Yes, NVIDIA did - Fermi is supposed to be able to do exactly this. But the current generations of CUDA capable GPUs cannot.

3dward · November 25, 2009, 4:01pm

yup, I am looking at Fermi spec at the moment. But wt I see is that

the concurrent kernels are from the same application only.
the blocks to SMs are scheduled by chip-level scheduler, not under user control

“On the Fermi architecture,
different kernels of the same CUDA context can execute concurrently, allowing maximum
utilization of GPU resources. Kernels from different application contexts can still run
sequentially with great efficiency thanks to the improved context switching performance.”

Concurrent kernels execution only benefits application with kernels can be launched in parallel.

parallelis · November 25, 2009, 6:02pm

I do that on my projects, at WARP-level and SM-level:
you have to have the different code and select which path you execute depending on your thread number :-)

For example on a WARP-level, I use one warp to do global-memory access : write back information from Shared memory, prefetch tasks infos from global memory into shared (maintaining two FIFO in shared memory), the other warps just do computing using registers and shared memory.

On a kernel-level, I dedicate sometime one SM to organization tasks, FIFO management and even MAPPED PINNED MEMORY exchange to enable real-time communication with host CPU :-)

The main problem is to use variable that have the same name to be sure registers won’t be allocated for each task (too much registers) so I call my registers rxx (r00 … r99) and use manual mapping with #define for needed variables. Not really the simplest way, but really efficient

3dward · November 25, 2009, 6:23pm

Thanks for reply iPAX. I guess you are talking about a single problem set launch at the same time. What i am talking about here is application-level. Eg an excel xll call a CUDA MC application and another C++ CUDA app doing whatever visual simulation.

FERMI looks powerful but still can not run kernels from different application concurrently.

avidday · November 25, 2009, 6:33pm

It is intended for HPC workloads. Can you really foresee a situation where you would want multiple HPC applications on the device simultaneously?

jma · November 27, 2009, 10:44am

No, but I can easily foresee a situation where it is not running any HPC workload at all. Like here in my bedroom studio …

Topic		Replies	Views
Kernel scheduling with Fermi independent blocks can be placed in new streams? CUDA Programming and Performance	14	13204	January 22, 2010
Utilization of SMs in a GPU CUDA Programming and Performance	3	9372	July 4, 2010
Concurrent kernels execution using streams in multiple CPU threads CUDA Programming and Performance	7	10622	June 26, 2012
Easiest way to invoke two different kernels simultaneously ? CUDA Programming and Performance	4	5766	April 12, 2012
Run different kernel functions on different Multiprocessors simultaneously Is it possible to assign CUDA Programming and Performance	3	2917	December 24, 2009
Could you please post a sample code on CUDA Programming and Performance	3	9960	March 4, 2011
Fermi speculation Kernel invocation in kernel code CUDA Programming and Performance	10	4300	October 20, 2009
CUDA processor allocation CUDA Programming and Performance	7	3438	October 5, 2007
Fermi CUDA Programming and Performance	3	7730	March 25, 2010
Invoking kernel from multiple PC processes CUDA Programming and Performance	1	5503	June 3, 2011

putting multiprocessors in group

Related topics