I’m facing a weird problem in my kernel code. For a better understanding of the problem, I’m posting a simple example kernel. Please help me with this problem.
Here is the example kernel code,
Grid size: (3625, 1) thread block size: (64,1)
texture<unsigned int, 2, cudaReadModeElementType> texlookup;
global void runCUDA(float *dA)
int bx = blockIdx.x;
int tx = threadIdx.x;
unsigned int didx = tex2D(texlookup,tx,bx);
dA[didx] = 10;
dA is initialized with zeros.
dA size: 400000
texlookup size: x:64,y:3625
It contains indices 0,30,60,…27532(upto 232000 values). These indices were precomputed on the host side based on some logic and stored in texture.
When I copy back dA to host and print the results, they are not what I expected to be.
According my understanding, the results should be this way,
dA = 10, dA = 10, dA = 10, …
other intermediate values should retain original values,
dA = 0, dA = 0…
But for some reason CUDA writes output as follows,
dA = 10, dA = 10, dA = 10… dA = 10,…
I’m writing only in the index locations fetched from texture. Why are other index locations getting affected? I’m not worried about coalesced write right now.
Am I missing something here?
I appreciate your help.