How to use nvprof --metrics gld_efficiency on RTX2080ti

houhongyi · August 21, 2019, 8:49am

I want to measure global memory load efficiency,but when I run

nvprof --metrics gld_efficiency ./HellowWorld.exe

,it shows

nvprof --metrics gld_throughput .\HellowWorld.exe 32 32
======== Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability greater than 7.2
==12656== NVPROF is profiling process 12656, command: .\HellowWorld.exe 32 32
SumOnGPU Time Cost 6.138 ms
sumMatrixOnGPU2D <<<(512,512), (32,32)>>>

==12656== Profiling application: .\HellowWorld.exe 32 32
==12656== Profiling result:
No events/metrics were profiled.

How can I measure global memory load efficiency on RTX2080ti,which compute capability is 7.5.

And when I run it on 1080ti(another GPU on my computer),it shows:

nvprof --metrics gld_throughput .\HellowWorld.exe 32 32
======== Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability greater than 7.2
==16568== NVPROF is profiling process 16568, command: .\HellowWorld.exe 32 32
==16568== Error: Internal profiling error 4292:1.
SumOnGPU Time Cost 18.090 ms
======== Error: CUDA profiling error.

OS Win10 x64
CUDA 10.0

In my code,I use

cudaSetDevice();

to select GPU,but nvprof also shows "Skipping profiling on device 0 ",should I Shield 2080Ti with Environment Variable？

qixotic · August 21, 2019, 12:13pm

If I recall a previous post here, you should be using the Nsight profiler for compute capability above 7.2. My RTX 2080ti claims compute capability of 7.5.

I eagerly await confirmation or correction from those here who really know this stuff

Robert_Crovella · August 21, 2019, 1:49pm

Yes, to do metric gathering on kernels on Turing GPUs and beyond, you must use Nsight Compute. I recommend using the latest version in CUDA 10.1U1 or 10.1U2 (or whatever is the latest version).

Furthermore, these efficiency metrics are not currently available in the Nsight Compute tool. However an equivalent metric for global load efficiency could be global load transactions per request. That metric is also not available, but it can be assembled from the available metrics:

l1tex__t_sectors_pipe_lsu_mem_global_op_ld.sum (transactions)
l1tex__t_requests_pipe_lsu_mem_global_op_ld.sum (requests)

capture both of these metrics, and then divide the numbers. 100% efficiency is equivalent to 4 transactions per request. A higher number of transactions per request (up to 32 max) is indicative of reduced efficency.

[url]Nsight Compute CLI :: Nsight Compute Documentation

Topic		Replies	Views
calculating gst_throughput and gld_throughput with nvprof Visual Profiler and nvprof	0	2098	March 23, 2013
Wrong result of gld_throughput using nvprof Visual Profiler and nvprof nvbugs	0	579	August 4, 2023
nvprof warning: Metric "gld_throughput" cannot be found on device 0 CUDA Programming and Performance	6	3111	September 26, 2015
global memory transactions Visual Profiler and nvprof	0	546	January 16, 2020
nvprof --analysis-metrics not working for RTX 2070 (CUDA 10.0) Visual Profiler and nvprof	6	14672	June 7, 2019
global memory load efficiency - profiling CUDA Programming and Performance	0	1123	May 7, 2014
Cannot profile RTX 2060 KO (TU104) with CUDA 11.0 on windows and ubuntu Visual Profiler and nvprof nvbugs	8	2924	July 27, 2020
nvprof --metrics branch_efficiency..... Why no metrics ? Visual Profiler and nvprof	3	1803	December 14, 2019
what is the mean of `gpu__compute_memory_access_throughput` Nsight Compute	4	1135	August 22, 2019
Why my global load efficiency always 50% CUDA Programming and Performance	4	832	January 4, 2018

How to use nvprof --metrics gld_efficiency on RTX2080ti

Related topics