bit shifts in kernel?

Hi all,

sorry for this newbie question, but I am confused. Just getting started with CUDA and wrote the code below. What I want is to add a constant float to each element in a vector. However if call the kernel with val = 1.0 and pSrcDst filled with 1.0’s it returns all 1025.0’s. This looks like things got bit shifted. When I change the 2nd kernel line to

pSrcDst[idx] = val;

it does replace all values with 1.0.

It is probably quite trivial, but I am a bit confused at the moment.

Thanks a lot.

global
void cukBatchAddC_32f_I(float val, float pSrcDst)
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
pSrcDst[idx]+= val;
}
void cuBatchAddC_32f_I(float val, float pSrcDst, int len)
{
float dSrcDst;
cudaMalloc( (void
)&dSrcDst, len
sizeof(float) );
cudaMemcpy (dSrcDst, pSrcDst, len*sizeof(float), cudaMemcpyHostToDevice);

dim3 threadsPerBlock(256, 1); 
    dim3 numBlocks(len / threadsPerBlock.x, len / threadsPerBlock.y); 
    cukBatchMulC_32f_I<<<numBlocks, threadsPerBlock>>>(val, dSrcDst);

    cudaMemcpy (pSrcDst, dSrcDst, len*sizeof(float), cudaMemcpyDeviceToHost);
    cudaFree(dSrcDst);

    cudaError_t error = cudaGetLastError();
Logger::instance()->log("%s\n", cudaGetErrorString(error));

}

Well, what do you pass in for [font=“Courier New”]pSrcDst[/font]?

There is however a mistake in the calculation of [font=“Courier New”]numBlocks[/font], which leads to your kernel trying to access [font=“Courier New”]len*len[/font] floats instead of just [font=“Courier New”]len[/font] values, which probably makes your kernel fail at some point (at least for larger values of [font=“Courier New”]len[/font]).

To catch these kinds of problems, always check return codes of CUDA function calls and test your program under cuda-memcheck.

Upd: the faulty numBlocks calculation was indeed responsible for the behavior. It was a sloppy copy/paste job I’m afraid. Thanks again for pointing it out.

Thanks for the tip. I’ll look into that. What I pass in as pSrcDst is an array of floats of length len.