Recursive and irregular memory access why perform poor?


I read somewhere that CUDA applications with recursive data structure or irregular memory access may perform poorly. Can anybody explain the reason behind this?

Thanks in advance,


I don’t think it’s true that they will perform “poorly”, just at some lower percentage of peak performance. Programs with “irregular” memory accesses will perform at lower performance than completely regular accesses on any processor.

It’s true that GPUs get the best memory bandwidth when accessing memory in a regular way (memory coalescing), but using textures (which are cached) gives good performance for irregular accesses.

We have plenty of CUDA applications that operate on recursive data structures with irregular memory accesses with great performance - for example our Optix ray tracing system, which traverses complex acceleration structures (BVH trees) in real-time: