CUDA slower than CPU Help me please...

Hi, CUDA programmer

I try to process some algorithm using CUDA on core i7, I use 9800 GT and CUDA 2.2. The result is processing on CUDA slower than processing on CPU. I think that, cause is adjusting of thread block or grid block.

my algorithm.

for ( … 100 rounds… ){
for (…100,000 rounds…){
… code …
for(… 10 rounds …){

code

}

code

}
}

In loop for 100,000 rounds and 10 rounds, data is array 1D and inside these loop have many functions for calculating the array 1D.

Now I set it…
threadsPerBlock = (1,10);
threadsPerGrid = (1,10); // block shape

My problem, I would like to set new block and thread. Please help me …

(Do you understand in my English ? So sorry)

threadsPerBlock = (1,10) is not a good setting, you should use 192 threads per block to hide pipeline latency if you have enough resources. You can use Occupancy in SDK/tool, it can give you number of active threads per multi-processor when you setup resource usage, register count per thread, shared memory per threadblock.

by the way, could you post your kernel that we can check why GPU is slower than CPU in your case?

If by “threadsPerBlock” you mean you are launching your kernel with something like my_kernel<<<1,10>>>(args) then I agree with LSChien that you need to launch more threads, ideally a multiple of 32. That said it would be easier if we could see some of your code…