I have implemented some fairly complex kernels and run into a problem with the numbering of texunits.
My code uses templates and instantiates my kernels on a variety of data types.
As I have been unable to successfully pass in textures of the corresponding format as argument to the kernels, I have instead made a lot (37 to be precise) texture<‘type’, 1> variables globally in my .cu file. Using macros based on another template argument I can then pass the correct texture variable names to “tex1Dfetch()” without any overhead.
This compiles into many different kernels in the .cubin but none of them accesses more than approx. 8 textures each. Unfortunately the 37 textures are referenced as texunit 0 through texunit 36 at the start of the .cubin. Consequently the textures numbered >31 cannot be read in the kernel. They return 0 as I can tell, whereas it works in emulation mode.
Personally I can move on for now reordering my textures as they are actually not all used for now. But it will eventually give me problems, so I am asking if anyone could suggest good solution?
Otherwise, I am really impressed with the power of templates programming available in Cuda. No more coding specifically for 1D, 2D, 3D, and 4D. Thanks NVidia!
Regards,
-Thomas