Putting a linearized 3D array in texture memory


I’m using CUDA to accelerate a medical physics algorithm and I have been told that using textures for read-only arrays would be a good way to optimize it.

In a first attempt using only global memory, I decided to linearize the 3D array in regard to the x coordinate, thinking that by using larger block sizes in x would make the threads fetch stuff in the memory that are close by. It did not work as expected, probably because there is A LOT of divergence in the kernel anyway…

Since the array I am trying to use does not need to change during computation, I was told that texture memory would be a good bet. I tried putting it in 1D memory, but it’s probably way too big for that… 128x128x128…

I would like to know what would be the best choice for me, since my array is already linearized. If it would be better for me to put it in a 2D array and use the simple texture example as a reference or use a more fancy approach using pitched pointer,channel and extent stuff.

Obviously I’m more of a physicist than a programmer so something simple would be great.


edit: To be more precise, it’s a radiation raytracing algorithm that needs to read a (3D) density map along the way…

From the vague description it sounds like reading through the texture path would indeed be appropriate. But on modern GPUs, you don’t need to set up explicit textures to take advantage of that. Instead, look into using the __ldg() intrinsic (see documentation).

__ldg() maps to an LDG machine instruction that reads through the texture path. While the compiler can also use LDG automatically, if you want to be absolutely sure that LDG will be used, use the intrinsic. Note that if the compiler happens to already use LDG everywhere, you won’t see a performance increase over your current code. You can use cuobjdump --dump-sass to look at the generated machine code before/after.