binning with kernel newbie about implement loop to kernel

Hi All,

I have a Complex array (batchsizennsizeof(Complex) ), and I would like to bin its squares in a float array (nn*sizeof(float)).

If I do this with host variables, its simple:

typdef float2 Complex;

void ComplexPufferToReal(Complex *h_signal, float *h_puffer, int batchsize, int nn)

{

    int k, j;

    for (j=0;j<nn*batchsize;j++)

   {

    h_puffer[k] += (h_signal[j].x * h_signal[j].x) + (h_signal[j].y * h_signal[j].y);

    k++;

    if (k>nn-1) k = 0;

   }    	

}

This one works just fine on the host. But I would like to do this binning on the device since I have to do this many times in a loop after an cudaFFT function.

I tried to implement the above function to a kernel so it would work on device variables (allocated with cudaMalloc, and zero-d with cudaMemset):

__global__ void ComplexPufferToReal(Complex *d_signal, float *d_puffer, int batchsize, int nn)

{

  int k=0;

  int j = blockIdx.x*blockDim.x + threadIdx.x;

  if (j<batchsize*nn)

  {

    d_puffer[k] += (d_signal[j].x * d_signal[j].x) + (d_signal[j].y * d_signal[j].y);

    k++;

    if (k>nn-1) k = 0;

  }    	

}

calling like this for example:

int batchsize = 1000; 

int nn = 512; 

int Nofmaxthreads = 512;

int nBlocks = (batchsize*nn)/Nofmaxthreads + ((batchsize*nn)%Nofmaxthreads == 0 ? 0 : 1);

ComplexPufferToReal<<<nBlocks, Nofmaxthreads>>>(d_signal, d_puffer, batchsize, nn);

I think I mixed up something, this is not working like the one above, somebody please could help me, how to do the above simple loop with device variables in a kernel? (I have a Gefore8600GT, WinXP, VS2003+nvcc)

Thanks very much in advance :) ,

Regards,

András

The value of k in the kernel is equal to 0 for every write to d_puffer.

upss, i made a really really primitive error :haha:

thanks very much for the answere! :)

Regards.