Many texture bindings please, advise me better implementation!

Hi,

I have one another question.

Suppose I have several N (ca. 20-50) arrays of floats with different lengths L(N), (L(i)>100000,i=1,…,N). Most of them are accessed in my kernel just for reading. There is a simple question, what is the better way to program:

  1. make N different cudaBindTexture and save all descriptors in shared memory on each thread, or

  2. construct all arrays in one very large array of size s=sum_{i=1}^N L(i), call ones cudaBindTexture for this very large array and send one descriptor and shifts to all arrays to kernel?

The usage of shared memory in this case will be the same, there only question what is the better, one binding or multiple?

Thank you for your kind advise!

Sincerely

Ilghiz Ibraghimov

If you have 50 such arrays, it will certainly be less of a code management headache to bind just one array. If bound using using cudaBindTexture and read with tex1Dfetch, you shouldn’t run into any max texture size problems.

Just a side note: AFAIK the texture descriptors are not stored in shared memory.

Hi,

thank you for your answer!

Yes, all my arrays are float4 so, it is impossible to hit the max texture size (2^27) since 2^27*sizeof(float4)=2Gb and all GPUs nowadays have less or equal 2Gb main memory :)

I supposed to place them directly to shared memory by copying from global memory. Is there any hidden structures allocated in cudaBindTexture stays in global memory?

Thank you!

Sincerely

Ilghiz

Textures are handled by special hardware units. A texture descriptor in the final compiled cubin is just an integer :) When you perform a texture read, the texture unit with that integer is requested to make the read so all the information about the size of the texture and what not you specify in cudaBindTexture is stored in the texture unit.

Now, whether that texture unit reads the information from global memory or has its own special registers is something we don’t know. But it doesn’t really matter. It is all handled by cudaBindTexture and nvcc. You don’t have to explicitly manage your texture descriptors unless you write PTX by hand.

Hi,

I do not agree with you that the place where the texture unit is placed, doesn’t matter! In case if it is placed on registers or shared memory it can require more of them, in the case if it is on the global memory, it can be certain slow down. Hence, I agree with you that one descriptor is better that several of them for the original question of this topic.

Sincerely

Elena