Writing into device mem variable inside the kernel

Hi

I’m facing a weird problem in my kernel code. For a better understanding of the problem, I’m posting a simple example kernel. Please help me with this problem.

Here is the example kernel code,

Configuration:
Grid size: (3625, 1) thread block size: (64,1)

Code:

texture<unsigned int, 2, cudaReadModeElementType> texlookup;

global void runCUDA(float *dA)
{
int bx = blockIdx.x;
int tx = threadIdx.x;

unsigned int didx = tex2D(texlookup,tx,bx);
dA[didx] = 10;

}

dA is initialized with zeros.
dA size: 400000

texlookup size: x:64,y:3625
It contains indices 0,30,60,…27532(upto 232000 values). These indices were precomputed on the host side based on some logic and stored in texture.

When I copy back dA to host and print the results, they are not what I expected to be.
According my understanding, the results should be this way,
dA[0] = 10, dA[30] = 10, dA[60] = 10, …
other intermediate values should retain original values,
dA[1] = 0, dA[2] = 0…

But for some reason CUDA writes output as follows,
dA[0] = 10, dA[2] = 10, dA[4] = 10… dA[30] = 10,…

I’m writing only in the index locations fetched from texture. Why are other index locations getting affected? I’m not worried about coalesced write right now.

Am I missing something here?

I appreciate your help.

Thanks

Hi dwaras3284,

 One idea--with 232000 texture values and a stride of 30 between values, this would put your upper limit at about 6960000, which would overrun your dA size of 400000 (even more so if that value is given in bytes).

 Syntactically, I don't see anything else.  This does not include the host side code, so if the idea above does not fix the issue, maybe you could pose the host code as well.

 Hope this helps.

twbjr