Why does the performance of using texture memory in the A4000 decrease compared to the RTX4000?

438020661 · November 26, 2024, 1:41am

Why does my 3D reconstruction’s front projection algorithm, which uses texture memory for value retrieval, show significant performance improvements compared to global memory on P4000 and RTX4000, but experience a performance decline on A4000 and A2000 compared to global memory? Furthermore, on the A4000, using texture memory to run my algorithm is even more than 30% slower than on the RTX4000. What is the reason for this, and is it related to the architecture of the A2000 and A4000 graphics cards?My kernel function has not been specifically optimized for certain GPUs, nor has it used any low-level instruction optimizations; it simply involves some value retrieval and calculation operations.

njuffa · November 26, 2024, 5:38am

I’ve read the question three times now and I am still confused as to what the relative performance of the various GPUs and access modes actually is. Could you express it as performance ratio normalized to the slowest configuration, that is,

               global memory | texture | bandwidth  |
-----------------------------+---------+------------+
Quadro P4000         1.0     |  ?      | 243.3 GB/s | 
Quadro RTX 4000      ?       |  ?      | 416.0 GB/s |
RTX A2000            ?       |  ?      | 288.0 GB/s |
RTX A4000            ?       |  ?      | 448.0 GB/s |

In the absence of actual data, beyond changes in actual memory bandwidth between GPU architectures, a plausible hypothesis is that performance of the non-texture access path has increased more than in the texture path due to improvements to the general-purpose cache hierarchy, which is the more commonly used path, whereas texture is a specialized (and increasingly niche) way of accessing memory (“make the common case fast, and keep the uncommon case functional”).

[Later:]

I have added the raw GPU memory bandwidth for the different GPUs to the table; the data was taken from the TechPowerUp database) This data suggests that you should see slightly better (by a few percent only) performance on the RTX A4000 compared to the Quadro RTX 4000.

Topic		Replies	Views
GTX 470 slower vs GTX 280 CUDA Programming and Performance	11	7131	May 13, 2010
Convenience of 2D CUDA texture memory against global memory CUDA Programming and Performance	4	4332	January 21, 2013
Confusion on using texture? CUDA Programming and Performance	14	4953	September 4, 2009
CUDA texture memory performance CUDA Programming and Performance	4	33602	January 13, 2009
GTX 470 performance gains too low ? (texture operations) CUDA Programming and Performance	16	10979	April 22, 2010
texture memory vs global memory CUDA Programming and Performance	10	13791	August 20, 2007
what's the benefit of using texture memory in Fermi verus using global memory CUDA Programming and Performance	12	2804	August 9, 2010
Texture Memory in Maxwell is slower than global memory? CUDA Programming and Performance cuda	1	316	December 30, 2023
Decreased performance when using textures CUDA Programming and Performance	2	464	April 8, 2019
Texture vs Global memory which of this is faster? CUDA Programming and Performance	2	5476	August 18, 2011

Why does the performance of using texture memory in the A4000 decrease compared to the RTX4000?

Related topics