Reference to SOTA for fast Trilinear Interpolation? Texture Memory? Unified Memory?

Hi all,

It’s been, maybe ~6 years since I’ve looked at some CUDA code I’ve written which utilized texture memory to perform some medical image routines involving many thousands of calls to trilinear interpolation in a 3D voxelized space. Using texture memory and tex3D has been great for achieving this.

I’m trying to understand some of the latest advancements in the GPU architecture, has the unified memory model of the latest CUDA versions made explicitly allocating texture memory obsolete? And if so, I would greatly appreciate a linked example showing how this is done in modern practice.

Or perhaps I am over thinking this and I will stick with tex3D calls for more future work.

If you want to take advantage of hardware texture interpolation, nothing has changed, other than that texture references are deprecated now and will go away real soon, so you would want to change the code to use texture objects if based on texture references. Since texture objects were introduced in 2013, which is more than 6 years ago, your code may have been written using texture objects from the start.

The hardware texture interpolation performs its computation with 1.8 fixed-point arithmetic (see relevant appendix in the CUDA programming manual). Which is coarse granularity leading to limited-quality results. I am aware that there is a trend in CT for example towards higher-quality images requiring more fine-grained interpolation, which would motivate the use single-precision FMAs to perform the interpolation in software. I do not have sufficient domain knowledge to gauge how pronounced this trend is.

While use of textures was often necessary in early GPUs to maximize memory throughput, this is much less true with modern GPUs which have greatly improved memory subsystems that allow for improved caching of read-only data even without the use of textures. As one bit of anecdotal evidence, I have not used textures in half a dozen years or so.

What you might want to do is create a little experiment where you use your legacy code to establish baseline performance and then code a new version using software interpolation and without use of textures. Study the Best Practices Guide on how to maximize memory throughput and also make sure to use FMAs as aggressively as possible. I would also recommend use of the latest CUDA toolchain, as I still see evidence of continuous incremental improvements in code generation in the compiler. Obviously you would want to use modern hardware, say Pascal or a later architecture (Volta, Turing, Ampere).

njuffa, thank you very much for clarifying this for me. Your input has helped a lot.

Yes, I think I will need to perform some bench mark tests and include your helpful suggestions and look at the results with and without texture on modern hardware.

Again thank you very much.

Just to be clear: The switch from low-accuracy texture-based interpolation to high-accuracy software-based interpolation using single-precision FMAs will have an impact on performance. As I recall for examples from CT that I saw, there was slowdown by a factor of 2x for a 10243 volume. Given the significant improvements in GPU memory and computational throughput since the Kepler architecture, I would expect a non-texture software-interpolated version to be sufficiently fast at realistic resolutions while providing noticeably improved quality.

For large volumes like 10243 you would likely want something like an RTX 2080 Ti or an RTX 3070 as the minimal hardware platform.

While not an exact match for OP’s use case the following recent paper might be of interest, as it discusses much of the general challenges behind this kind of image processing, covering platforms up to Pascal/CUDA 8. The authors use texture-based interpolation on volumes of up to 20483 (2048 projections at a resolution of 2048x2048).

Suren Chilingaryan, Evelina Ametova, Anreas Kopmann, Alessandro Mirone, “Reviewing GPU architectures to build efficient back projection for parallel geometries”, Journal of Real-Time Image Processing (2020) 17:1331–1373

Thank you njuffa! Forwarding this to our research group, thanks for finding this paper, it is very relevant

If you find that article useful, maybe the following is of interest as well. In it the authors discuss the limited granularity of hardware texture interpolation and possible mitigation measures, but seem to be comparing to interpolation in software using double-precision arithmetic (which I found strange; maybe I misread when skimming through).

Rui Liu, Lin Fu, Bruno De Man, Hengyong Yu, “GPU-based Branchless Distance-Driven Projection and Backprojection”,
IEEE Trans Comput Imaging. Dec 2017; 3(4): 617–632.