(1) The total number of texture references my program can have? I mean “texture<>” declarations, defined at file scope.
(2) The number of texture references a given kernel can use?
I seem to be encountering a very low limit (12 texture references total) on a GTX 285. Surprisingly, there is very little information about this in these forums or on the web in general, and what’s out there is inconsistent. I’ve heard everything from 4 to 32 to 512 to no limit.
Hmm… How about loading CUBIN via driver API? Are Textures given a place in CUBIN? I am just so confused about the whole thing…How about multi-GPU case?? Hmm…
I think one of the key problems at the moment is the lack of function pointer/subroutine support in kernels. It seems like texture access winds up being translated into in-line assembler during compilation, and everything needed to make the texture thread launch happen needs to be available to the compiler in the same compilation object. If it weren’t inlined, then it might be possible to leave a dangling symbol and have the driver match up everything J.I.T at runtime, something like the way a modern shared library runtime linker works. That has side effects though - program launch times could be much longer than now, especially with complex applications, and then you have the new situation where a CUDA app that compiles without error doesn’t run and returns with a bunch of symbol or object errors. Which in many ways is a harder and more complex set of problems to debug than now. Also it adds additional functionality, complexity, and overhead to the driver which is already a larger and complex piece of code.
With the arrival of Fermi, it will be interesting to see how the tool chain develops, but as it is now I don’t see how it could be done.
Current hardware can’t dynamically index into an array of texture references (samplers). Note that it is possible to bind different arrays (textures) to your texture references at kernel launch time.
We are thinking about adding support for texture arrays, which let you dynamically index into an array of identically-sized images. Note that you can already do this today using 3D textures in CUDA, but there are size restrictions.