CUDA slower than CPU Help me please...

Hi, CUDA programmer

I try to process some algorithm using CUDA on core i7, I use 9800 GT and CUDA 2.2. The result is processing on CUDA slower than processing on CPU. I think that, cause is adjusting of thread block or grid block.

my algorithm.

for ( … 100 rounds… ){
for (…100,000 rounds…){
… code …
for(… 10 rounds …){

code

}

code

}
}

In loop for 100,000 rounds and 10 rounds, data is array 1D and inside these loop have many functions for calculating the array 1D.

Now I set it…
threadsPerBlock = (1,10);
threadsPerGrid = (1,10); // block shape

My problem, I would like to set new block and thread. Please help me …

(Do you understand in my English ?)