Run different kernel functions on different Multiprocessors simultaneously Is it possible to assign

Is it possible to assign different kernels to different multiprocessors (on the same graphic card) and run different kernel functions simultaneously with CUDA 2.3? I am looking for a way to run slightly different applications accessing the same data reside on the global data at the same time. Searching through the forum, I found a post around Oct, 2007 which indicating this is not possible -

I would like to know, is this possible now with CUDA 2.3? If so, how to do it?


This is currently not possible. However, it is possible to have several device functions that are called from the same global function. For example

__device__ void function1(void*) {...}

  __device__ void function2(void*) {...}

__global__ void dispatch(void* in)


  if(blockId.x > 16) function1(in);

  else function2(in);


The problem is that you have to manually determine how functions are assigned to blocks.

Fermi is supposed to add support for this, but I’ll bet that you will have to make sure that both kernels are launched in different streams.

Gregory’s completely correct. This kind of micro-kernel switching works pretty well, actually, and I do it often in some of my more complex code.
The main disadvantage is compile time, actually… you can’t split the subkernels into multiple objects.

You can even break it down and do similar switches on the per-warp level, though you give up the ability to use syncthreads() to coordinate among the warps. However if your microkernels are indeed per-warp you won’t need syncthreads().


I will design my application per your suggestion.

Have a nice holiday!