Hi, CUDA programmer
I try to process some algorithm using CUDA on core i7, I use 9800 GT and CUDA 2.2. The result is processing on CUDA slower than processing on CPU. I think that, cause is adjusting of thread block or grid block.
my algorithm.
for ( … 100 rounds… ){
for (…100,000 rounds…){
… code …
for(… 10 rounds …){
…
code
…
}
…
code
…
}
}
In loop for 100,000 rounds and 10 rounds, data is array 1D and inside these loop have many functions for calculating the array 1D.
Now I set it…
threadsPerBlock = (1,10);
threadsPerGrid = (1,10); // block shape
My problem, I would like to set new block and thread. Please help me …
(Do you understand in my English ?)