i wrote a basic program with a kernel and test it on dual core cpu. i wanted to see thread synchronization but it was not looked like paralell. threads began after another was ended. so they were seemed sequentially. normally when i invoke kernel such as "
global void myKernel(int *a , int * b )
by instruction
“myKernel<<<1. 3>>>(a, b )” does it create three thread?. but do they run as parallel or sequential. in our program i saw my threads run like sequential . what is wrong with this?
thanks!..
Emulation mode is for debugging purposes, not a proper (or high performance) implementation of the CUDA parallel thread execution model.
also i used cuda 2.0 does it support cpu emulation or must i use cuda 2.1 for cpu emulation mode?
Hmm, I am afraid you will have to wait for CUDA 2.2 or 3.0
Emulation mode (by this, I mean compiling with nvcc --device-emulation) works in CUDA 0.8, 0.9, 1.0, 1.1, 2.0, 2.1 and all beta versions.
If you are asking about the MCUDA-esq --multicore compilation (which will be high performance), this was mentioned as a feature coming in 2.1 but was dropped without explanation. One hopes that it will be coming soon. I certainly do!