In my application, I use a texture bind to a CUDA array 2D. I access the CUDA array using tex2D function in my kernel.
With texture of width<=16384, there is no problem. Beyond 16385, my kernel returns strange results. Indeed, if my kernel works
with texture of 100 columns, it should work with texture of 30000 columns. I know the limit of texture is 65536 columns
and 32768 rows.
So, I’ve written a short code doing the following task. In the following, “array” denotes a standard “float *”.
Randomly initialize on HOST an array of size WIDTHxHEIGHT. This array is denoted by “A”.
Copy the memory from A to a CUDA array of size WIDTHxHEIGHT. This CUDA array is denoted by “B”.
Define a 2D texture. This texture is denoted by “C”.
Bind texture C to CUDA array B.
Define an array of size WIDTHxHEIGHT on DEVICE. This array is denoted by “D”.
Call the kernel. This kernel copy C to D using tex2D.
Define an array of size WIDTHxHEIGHT on HOST. This array is denoted by “E”.
Copy memory from D to E.
Compare A and E.
A and E should be equal. With WIDTH<=16384, there is no problems. With WIDTH>16384, the extremities are differents:
with HEIGHT=10 and WIDTH=16385, first and last columns are differents.
with HEIGHT=10 and WIDTH=16386, first 2 and last 2 columns are differents.
…
I join to this message the code allowing to verify this behavior.
Compile with “make”.
Run with “make run”.
Change the constant “WIDTH” to 16384 => there is no errors.
Change the constant “WIDTH” to 16385 => there is 20 errors (2HEIGHT).
Change the constant “WIDTH” to 16386 => there is 40 errors (4HEIGHT).
…
In Programming guide, the maximum width is 65536 but this is not specify if the limit is in bytes or in number of elements.
For a grid, the limit is 65536 width. Each thread block is limited by 512 threads. So, you can have an array of size 65536*512 width.
I think it should be the same for texture no? Maybe you’re right and the limit is 65536 bytes width. If your right, we can consider only array of width 16384 which is not enough for GPGPU applications.
Note that 65536 is per dimension limit of a grid. Whereas 512 is a hard limit on the size of one block…
btw, Array sizes are not limited by block and thread dimensions… For example, the code below can clear big array sizes and does NOT limit. It is only limited by the 32-bit “i”.
Moreover, texture binding takes only bytes as arguments. Why would texture be concerned about integers and floats and other data types? I would assume that it is only concerned with “bytes”.
You can always have arrays of “texture” to satisfy your big needs.
As Mr.Anderson suggested, you should look @ checking the return value of the texture “bind” call.
OK, it seems clear that the limited width of a texture is 65536 bytes which corresponds to 16384 floats.
In my initial code, I checked the returned value of the cudaBindTextureToArray function. Even if the limit width was exceeded, cudaBindTextureToArray returned “no error”. This is exactly why I thought that the limit wasn’t exceeded…