Vector refuses to return correct number

First, I want to apologize for any strange grammatics in this text, because I’m not a native english speaker.

Well I have a problem in a kernel like this (some things of it makes no sense, but ignore it).


 int iy = blockDim.y * blockIdx.y + threadIdx.y; 
 int ix = blockDim.x * blockIdx.x + threadIdx.x; 
 int idx = iy * gridDim.x + ix;
 
 int i = 0;
 d_resul[idx] = 0;
 
 if (idx <= size_val)
 {		    
          d_resul[idx] = d_val[(int)d_ptr[idx]+i] * d_vector[(int)d_ind[(int)d_ptr[idx]+i]];
 }

After execution, all computed elements of d_resul was 0.

So I went to check if there was errors in the 2 operands of the multiplication: “d_val[(int)d_ptr[idx]+i]” and “d_vector[(int)d_ind[(int)d_ptr[idx]+i]]”.

“d_val[(int)d_ptr[idx]+i]” was OK, it was returning the number it should.

“d_vector[ (int)d_ind[(int)d_ptr[idx]+i] ]”, however, was always returning 0. It shouldn’t return 0.
So I decided to see which position of the array d_vector it was accessing. In other words, I tested “(int)d_ind[(int)d_ptr[idx]+i]”, and it was returning the number it should.

SO, the error must be in the d_vector array. But it is not. I have tested d_vector and it’s ok.

In other words, the following happens (example):
(int)d_ind[(int)d_ptr[idx]+i] is 5
d_vector[5] is 10

but d_vector[(int)d_ind[(int)d_ptr[idx]+i]] is 0, when It should be 10.

Someone have any idea why this happens?

Thanks!

There are a lot of reasons why it might happen. I’d start with a simple check after the kernel call that the kernel has finished working

and indeed run (cudaThreadSynchronize and cudaGetLastError). If the kernel ran fine, modify the kernel to something like this:

__global__ void ....

{

d_resul[idx] = threadIdx.x;

}

Now in the host code see that the values are ok. Gradually open more and more of your original code till you see the problem.

eyal