I’m having a problem with textures. I bind linear memory to 2D texture and then run a kernel where I copy values from texture back to device memory. I did test program and everything went well but then I used the same code in my simulation program and it doesn’t work.
(*fluid).cx = 10; //the width of array is then 12 (*fluid).cy = 10; //height is 12 cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>(); texTest.addressMode = cudaAddressModeClamp; texTest.addressMode = cudaAddressModeClamp; texTest.filterMode = cudaFilterModePoint; texTest.normalized = false; cudaBindTexture2D(0, texTest, d_test, channelDesc, (*fluid).cx+2, (*fluid).cy+2, ((*fluid).cx+2)*sizeof(float) ); dim3 threadsPerBlock(16, 16); dim3 numBlocks( (((*fluid).cx+2)+threadsPerBlock.x-1) / threadsPerBlock.x, (((*fluid).cy+2)+threadsPerBlock.y-1) / threadsPerBlock.y); TESTKERNEL<<<numBlocks, threadsPerBlock>>>(d_testR, d_xco, d_yco, d_fluid); //d_testR-array to store output, d_xco and d_yco are arrays to store threads indxs and d_fluid contains info about the simulation grid like size etc //copy back to host and print
and the part of output is
index in linear memory = value from texture = right value x and y coordinate of thread ... 7= 7.000000 = 7.000000 7 0 8= 8.000000 = 8.000000 8 0 9= 9.000000 = 9.000000 9 0 10= 10.000000 = 10.000000 10 0 11= 11.000000 = 11.000000 11 0 12= 8.000000 = 12.000000 0 1 13= 9.000000 = 13.000000 1 1 14= 10.000000 = 14.000000 2 1 15= 11.000000 = 15.000000 3 1 ...
So values for first row are ok but all values in second row are smaller by 4 tha they shloud be. in 3. row they are smaller by 8, in 4. row by 12 etc.
These differencies differs with width and height of the array. If with and height is something like 64 16 etc. everything is ok.
I’m running the simulation without any problems, so the problem shouldn’t be something like that I’m not having enough threads or something like this. But now I’m using cudaArrays and I find out that memcpy(deviceToDevice) is two time faster than memcpyToArray(deviceToDevice). So I want to use linear memory instead.
Anyone has an idea what might be a problem?? I would be most greatful for any help.