I can’t get the texture unit working as expected.
I am new to graphics programming, so maybe I did not understand the texture-concept correctly.
I am implementing a sparse-matrix multiplication. This comes along with random memory access, i.e. not coalesced. The CUDA programming guide suggests using the texture unit for this. So I replaced:
(source_vector is not sparse)
cudaBindTexture( tex_source, source_vector, channelDescm_source, size,0) ); // some code using: texfetch(tex_source, index) cudaUnbindTexture( tex_source)
It works perfectly in device emulation mode, but not on the card. I get wrong results. Is there something wrong with this way of using textures? The speedup I see is approx. 2.
thanks a lot in advance,