Could you please post a sample code on

sharath · February 26, 2011, 4:13am

How multiple kernels work in Fermi, please post some sample code.

About the L1, L2 cache in fermi. Are they customizable like shared memory ?

Thanks in advance

sharath · March 4, 2011, 8:58am

ok if not the sample code. Please Tell me is this the right way to do it.

I am launching 2 kernels without using cudaThreadSynchronize() in between

scanfirst<<<1,n>>>(first_d,n,index1);

scansecond<<<1,n>>>(second_d,n,index2);

Is this the correct way of launching multiple kernels ??

Sarnath · March 4, 2011, 9:16am

Multiple Kernels inside the same context can work concurrently in FERMI only if they are launched under different STREAMS.
Check the STREAMS concept in CUDA.

heshsham_India · March 4, 2011, 11:46am

Check the sample code in SDK 3.2 : simpleStreams, and concurrentKernels

In programming guide and in best practices they have explained how Stream APIs are used for concurrent Kernel Execution and Memory Copy.

The same concept can be applied to Concurrent Kernels execution. Look in sample code for more detail.