our next radio simulation kernel will require an array of several thousand of CUDA texture objects (currently we’ve set the limit to 10000). That won’t fit constant memory and hence it cannot be passed as kernel launch arguments.
I was wondering if there is a good way of putting this array itself into a texture to speed up access. We’ll have several consecutive threads access the same texture object - hence the texture cache should help a lot.
The problem is that a cudaTextureObject_t is typedef’d as unsigned long long - so how does one create a texture capable of storing such a behemoth object? Split it into lo and hi parts maybe and use a two channel integer texture? Are there 32 bit integer textures even? Or maybe we could use a 2 channel 32bit float texture and use the __float_as_int() intrinsic followed by reassembly of lo and hi parts into a 64 bit integer… hmm…