The LDG instruction is a global memory load that uses the texture path. It has the advantage that it does not require the explicit use of textures. Explicit uses of textures causes a certain amount of code clutter and overhead (e.g. for API calls to bind textures), and textures are objects unfamiliar to many programmers new to CUDA. The introduction of LDG therefore increases the ease of use.
Whether the use of LDG results in higher or lower performance compared to the use of classical textures depends on the use case, I am not aware of any hard-and-fast rule about that. When comparing the performance with LDG vs regular global loads, I have found that use of LDG results in higher performance in most cases, but recall at least one real-life use case where this was not the case.
As eyalhir74 points out, declaring pointers as “const restrict” facilitates, but does not guarantee, the generation of LDG instructions on Kepler-class GPUs. The reason for this is that the use of “const restrict” pointers makes assertions about local read-only behavior, whereas use of LDG requires the data to be read-only for the lifetime of a kernel. Only the use of the __ldg() device function ensures that LDG is generated.