Hello there. I’m learning CUDA, and I cannot find the answer for some simple questions.
First, a block of threads is assigned to each core or to each multiprocessor? For instance, I have a GTX 465. It has 11Multiprocessors and 32 Cores on each MP, that gives 352 cores. Would it be better to have eleven blocks ou 352 ones? There’s a example of this in the Programming Guide, but it doesn’t say anything about how MPs relate to all this.
Second, if I call two kernels, one followed by the other, the synchronize automatically or I have to call cudaThreadSynchronize()?
kernel1 <<< nblocks, nthreads>>> (var1,var2,var3); //say that var 3 is changed here // Do I have to put the cudaThreadSynchronize() here? kernel2 <<< nblocks, nthreads >>>(var1,var3,var4,var5);
Notice that there’s some dependecy here.
Thanks in advance.