ldg versus textures

msj · November 12, 2013, 5:40am

I am in the process of upgrading my code from c1060 to k20c

My main kernel used textures to speed up memory accesses on the c1060.

I have been changing it on the k20c. I have tried 3 approaches.

use textures as on the c1060
use ldg for loads along with const restrict
use const restrict but not ldg

I find 1 faster than 2 faster than 3.

Is this to be expected? The difference in speed between 1 and 2 is substantial, about 25%.

eyalhir74 · November 12, 2013, 10:53am

Hi,
From what I could see __ldg is faster. Note however that option #3 (const restrict) is just
a recommendation to the compiler, so if you want to make use of ldg you must specifically
use __ldg

eyal

njuffa · November 12, 2013, 6:40pm

The LDG instruction is a global memory load that uses the texture path. It has the advantage that it does not require the explicit use of textures. Explicit uses of textures causes a certain amount of code clutter and overhead (e.g. for API calls to bind textures), and textures are objects unfamiliar to many programmers new to CUDA. The introduction of LDG therefore increases the ease of use.

Whether the use of LDG results in higher or lower performance compared to the use of classical textures depends on the use case, I am not aware of any hard-and-fast rule about that. When comparing the performance with LDG vs regular global loads, I have found that use of LDG results in higher performance in most cases, but recall at least one real-life use case where this was not the case.

As eyalhir74 points out, declaring pointers as “const restrict” facilitates, but does not guarantee, the generation of LDG instructions on Kepler-class GPUs. The reason for this is that the use of “const restrict” pointers makes assertions about local read-only behavior, whereas use of LDG requires the data to be read-only for the lifetime of a kernel. Only the use of the __ldg() device function ensures that LDG is generated.

msj · November 13, 2013, 2:27am

thanks.

So it would seems that there is little point in changing code from explicit textures to ldg, since the gains
are to do with ease of programming rather than speed. And if one has already done the programming that is no benefit.

In 3.5, there are new texture objects, will these be the same as the old textures for speed?

njuffa · November 13, 2013, 2:56am

With regard to existing code that already uses textures explicitly, size restrictions on the textures could be a reason to look into alternative approaches on Kepler. Use cases where multiple textures are needed to map a single chunk of data can be quite cumbersome, in particular if more than two textures are required to cover the data. I do not have experience with texture objects.

eyalhir74 · November 13, 2013, 8:44am

I don’t think this is the right conclusion :)

Changing your code to use __ldg is very straight forward, so no real reason why not to test it. Just replace the tex2d (or whatever you use) in the kernel to the __ldg code and see what performance you get.
You can put it in a macro to easily switch between ldg, texture and regular global memory access to see which yields best performance.

Eyal

Topic		Replies	Views
Question about Global memory and Texture memory CUDA Programming and Performance	5	1071	October 23, 2014
What is the benefit from LDG or LDG128? CUDA Programming and Performance	2	1078	January 3, 2024
cuda sample using __ldg()..? CUDA Programming and Performance	1	4685	April 24, 2014
3GB can it be read as texture? CUDA Programming and Performance	25	3457	December 31, 2014
Maxwell (sm_50) instruction: LDG.E ? CUDA Programming and Performance	25	9290	August 15, 2015
Do const __restrict__ pointers ever generate LDG.CI loads on CUDA 7? CUDA Programming and Performance	9	3954	March 5, 2015
Tuning a kernel with LDG(ON/OFF,array) and prefetching CUDA Programming and Performance	12	19694	June 3, 2020
Putting a linearized 3D array in texture memory CUDA Programming and Performance	1	646	October 3, 2016
Improper use of __ldg() causes illegal memory access CUDA Programming and Performance	9	2288	January 31, 2015
Unexpected LDG operations CUDA Programming and Performance	4	611	July 23, 2020

ldg versus textures

Related topics