nvprof warning: Metric "gld_throughput" cannot be found on device 0

mortennp · September 25, 2015, 7:26pm

Dear Forum

I’m reading the Wrox book “Professional CUDA C Programming” and using the downloadable code samples. Chapter 3 introduces nvprof. However when I try to profile I get:

nvprof --metrics gld_throughput ./sumMatrix 32 32
======== Warning: Metric “gld_throughput” cannot be found on device 0.
==4539== NVPROF is profiling process 4539, command: ./sumMatrix 32 32
sumMatrixOnGPU2D <<<(128,128), (32,32)>>> elapsed 0 ms
==4539== Profiling application: ./sumMatrix 32 32
==4539== Profiling result:
No events/metrics were profiled.

My device should have the metric:

nvprof --query-metrics | grep gld_throughput
gld_throughput: Global memory load throughput
nc_gld_throughput: Non-coherent global memory load throughput

I compile with Makefile options:

%: %.cu
nvcc -O2 -arch=sm_35 -o $@ $< -lcudadevrt --relocatable-device-code true
%: %.c
gcc -O2 -std=c99 -o $@ $<

I am running a GT730-based card on Ubuntu 14.04. The problem persists with both Toolkit 6.5 and 7.5.

I’ve tried Google and Forum search to no avail. Any ideas much appreciated!

Best regards,
mortennp

Robert_Crovella · September 25, 2015, 8:10pm

There are two different GT 730 products out there. Run the cuda sample deviceQuery on your system and post the results here. also try running your code with cuda-memcheck to see if any errors are reported

mortennp · September 26, 2015, 6:25pm

Thank you, txbob.

Memcheck seems fine:

cuda-memcheck ./sumMatrix 32 32
========= CUDA-MEMCHECK
sumMatrixOnGPU2D <<<(128,128), (32,32)>>> elapsed 0 ms
========= ERROR SUMMARY: 0 errors

Output from deviceQuery:

./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce GT 730”
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 902 MHz (0.90 GHz)
Memory Clock rate: 900 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GT 730
Result = PASS

/mortennp

Robert_Crovella · September 26, 2015, 6:40pm

I don’t have any great ideas then. I don’t happen to have a GT730, but I have a GT640 which is quite similar (cc3.5 GK208 GPU) on CUDA 7.5, and I can profile the metric gld_throughput on various codes without difficulty on it.

Do you get the same error message if you attempt to profile, for example, a cuda sample code such as vectorAdd ?

What is the output of nvidia-smi on your machine?

mortennp · September 26, 2015, 6:48pm

Yes, same behaviour with vectorAdd from Toolkit samples.

nvidia-smi
±-----------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 730 Off | 0000:01:00.0 N/A | N/A |
| N/A 44C P8 N/A / N/A | 365MiB / 2047MiB | N/A Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
±----------------------------------------------------------------------------+

Thank your for responding, txbob!

Robert_Crovella · September 26, 2015, 6:56pm

I’m pretty much out of ideas. I suspect either:

A bug in nvprof
A corrupted software install on your machine, of some sort.

If you want to pursue item 1, file a bug at developer.nvidia.com
If you want to pursue item 2, try reloading CUDA or your OS

I’m not sure how you installed CUDA, you might make sure that the nouveau driver is removed from your machine.

Instructions for that are in the cuda install manual for linux:

[url]http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-nouveau[/url]

MingVonMongo · September 26, 2015, 8:33pm

Try to compile without the makefile options.

$ nvcc -arch=sm_35 kernel.cu -o kernel

then $ nvprof --devices 0 --metrics gld_throughput ./kernel

i am reading the same book (until chapter 7 now) and all code samples did work as expected (with CUDA v7.0).

Topic		Replies	Views
Incompatible CUDA driver version Visual Profiler and nvprof cuda	2	1520	July 29, 2021
NVProf error on samples CUDA Programming and Performance	28	20443	December 29, 2020
nvprof: incompatible CUDA driver version on TX2 Jetson TX2	12	3204	October 18, 2021
NV Visual Profiler: No GPU devices in session CUDA Programming and Performance	8	4397	March 11, 2015
No events/metrics were profiled when use nvprof in CUDA 10.1.168 Visual Profiler and nvprof	5	5023	December 14, 2019
unified memory profiling failed Visual Profiler and nvprof	12	6106	June 17, 2018
CUDA invalid records warning CUDA Setup and Installation	10	6222	August 10, 2018
nvprof never returns CUDA Programming and Performance	8	6304	March 30, 2016
nvprof core dumps on Ubuntu 16.04 CUDA Setup and Installation	12	3564	August 16, 2018
Always got this warning when nvprof cuda file "This can happen if device ran out of memory or if a device kernel was stopped due to an assertion" on just HellowWorld GPU CUDA Programming and Performance	9	2556	January 31, 2019

nvprof warning: Metric "gld_throughput" cannot be found on device 0

Related topics