Thank you for your quick response. Actually I know that fetching by row works as you described. The problem occurs when I try to fetch the columns of texture when the number of columns is more than a number of rows. So like in the example I gave 2 columns is more than 1 row.
I know that it should work because it works if I am using texture bound to CUDA arrays instead. But for some reason the textures bound to linear memory is not working. Either I have to resort to one dimensional textures or I have to transpose my data to make number of columns less than the number of rows.
2D-textures have certain layout requirements, so one would want to use cudaMallocPitch() instead of plain cudaMalloc() when binding a 2D-texture to linear memory. As a defensive programming practice I would recommend to never pass NULL as the first argument of a cudaBindTexture* call. Instead, pass in a suitable pointer so the function can pass back the texture offset and have the code take appropriate action should it be non-zero.
Thank you very much. cudaMallocPitch solved my problem. It turns out that the pitch parameter in cudaBindTexture2D must be a multiple of 256 bytes. Since it was only 4 bytes I had clamped my second column to first column.