Performance with memory assigment

Hello, I have a problem with this kernel:

I’m working in a Tesla C870 (compute capability 1.0):

(MAX_XBLOCK=MAX_YBLOCK=8)

For example, numVectors=2000, numNeurons=32 and dim=3205.

global

void updateUKE(float *d_examples,float *d_somItems, float *d_tmpD,float *d_tmpD1,float rr1,

                    int numVectors,int numNeurons,int dim)

{

int x = blockIdx.x*MAX_XBLOCK + threadIdx.x;

int y = blockIdx.y*MAX_YBLOCK + threadIdx.y;

if ((x < numVectors)&&(y < numNeurons)){

  int j;

float auxDist=0.0, temp=0.0;

for(j=0;j<dim;j++) {

    auxDist+=powf(d_examples[x*dim+j]-d_somItems[y*dim+j],2);                    

  }

d_tmpD[x*numNeurons+y]=auxDist;

  d_tmpD1[x*numNeurons+y]=-auxDist/rr1;

}//if(x<numVectors)&&(y<numNeurons)

__syncthreads();

}

The problem is that this kernel is very slow, it spent 0.4 secs but if I change the lines:

d_tmpD[x*numNeurons+y]=auxDist;

  d_tmpD1[x*numNeurons+y]=-auxDist/rr1;

by

d_tmpD[x*numNeurons+y]=temp;

  d_tmpD1[x*numNeurons+y]=temp;

the time now is 0.01 secs. Which is the diference between the register auxDist and temp ?? auxDist is a register that is calculated by each thread before the asignment d_tmpD[…]=auxDist. I don’t understand this… Can somebody help me??

Thanks, Francisco

It’s doing exactly what it should.

The optimizer is good.

If you replace the lines with temp, the compiler sees that the computation of auxDist isn’t used and therefore isn’t even needed.
So it removes the auxDist computation entirely, and the kernel runs much faster.

Well, in the line above you are using the function powf to compute a square.

That’s very bad, because powf is probably 10 times slower than a multiplication.

giovanni

Ok, but i tried the same code in cpu with a secuential loop, that is:

for(i=0;i<numVectors…

 for(j=0;j<numNeurons....

      for(k=0;k<dim......

and the compute time is the same that the execution in the gpu… why?? this algorithm can’t be paralelized with cuda??

Thanks,

No, with a multiplication the time is the same.

Best regards, Francisco