Hi
I’m facing a weird problem in my kernel code. For a better understanding of the problem, I’m posting a simple example kernel. Please help me with this problem.
Here is the example kernel code,
Configuration:
Grid size: (3625, 1) thread block size: (64,1)
Code:
texture<unsigned int, 2, cudaReadModeElementType> texlookup;
global void runCUDA(float *dA)
{
int bx = blockIdx.x;
int tx = threadIdx.x;
unsigned int didx = tex2D(texlookup,tx,bx);
dA[didx] = 10;
}
dA is initialized with zeros.
dA size: 400000
texlookup size: x:64,y:3625
It contains indices 0,30,60,…27532(upto 232000 values). These indices were precomputed on the host side based on some logic and stored in texture.
When I copy back dA to host and print the results, they are not what I expected to be.
According my understanding, the results should be this way,
dA[0] = 10, dA[30] = 10, dA[60] = 10, …
other intermediate values should retain original values,
dA[1] = 0, dA[2] = 0…
But for some reason CUDA writes output as follows,
dA[0] = 10, dA[2] = 10, dA[4] = 10… dA[30] = 10,…
I’m writing only in the index locations fetched from texture. Why are other index locations getting affected? I’m not worried about coalesced write right now.
Am I missing something here?
I appreciate your help.
Thanks