NVIDIA Visual Profiler is unable to profile application

I have installed CUDA Toolkit 10.2. CUDA Samples work fine, but NVIDIA Visual Profiler cannot even start to profile an application. When I click the item “PROFILE AN APPLICATION” the popup window appears:

CUDA Initialization Failed
Unable to locate CUDA libraries and establish connection with CUDA driver.
Make sure that CUDA and CUDA runtime libraries are on your path. See the installation guide for more information.
The Visual Profiler will exit now.

Of course, I have all environment variables setup: CUDA_PATH, CUDA_PATH_V10_2 both system and user variables.

If I run nvcc -V, the following result appears:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.2, V10.2.89

CUDA Device Query results in

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8192 MBytes (8589934592 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1772 MHz (1.77 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes  Total amount of shared memory per block:       49152 bytes

  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

PowerShell systeminfo prints:

OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.18363 N/A Build 18363
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Member Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          N/A
Registered Organization:   N/A
Original Install Date:     05/12/2019, 14:56:13
System Boot Time:          06/12/2019, 09:19:03
System Manufacturer:       Gigabyte Technology Co., Ltd.
System Model:              Z370 HD3
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 158 Stepping 10 GenuineIntel ~4008 Mhz
BIOS Version:              American Megatrends Inc. F6, 01/03/2018
Windows Directory:         C:\WINDOWS
System Directory:          C:\WINDOWS\system32
Boot Device:               \Device\HarddiskVolume2
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)
Time Zone:                 (UTC+01:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna
Total Physical Memory:     16.335 MB
Available Physical Memory: 10.720 MB
Virtual Memory: Max Size:  19.279 MB
Virtual Memory: Available: 11.558 MB
Virtual Memory: In Use:    7.721 MB
Page File Location(s):     C:\pagefile.sys
Hotfix(s):                 7 Hotfix(s) Installed.
                           [01]: KB4519573
                           [02]: KB4513661
                           [03]: KB4516115
                           [04]: KB4517245
                           [05]: KB4521863
                           [06]: KB4524569
                           [07]: KB4524570
Network Card(s):           1 NIC(s) Installed.
                           [01]: Intel(R) Ethernet Connection (2) I219-V
                                 Connection Name: Ethernet
                                 DHCP Enabled:    Yes
                                 DHCP Server:     10.24.3.1
                                 IP address(es)
                                 [01]: 10.24.3.140
                                 [02]: fe80::15d6:2aa6:c956:970a
Hyper-V Requirements:      A hypervisor has been detected. Features required for Hyper-V will not be displayed.

Hi etvixi,

This error can occur when CUPTI DLL is not on your PATH environment. It is located in folder

<CUDA_DIR>/extras/CUPTI/lib64

Please add above path in your PATH environment and try running Visual Profiler again. Please let me know if you still see the issue.


Thanks,
Ramesh

6 Likes

Ramesh,

I can confirm that your work-around works to solve the reported issue.

What isn’t clear is that NVIDIA Visual Profiler 10.1 works without adding <CUDA_PATH>/extras/CUPTI/lib64 directory to the PATH environment variable so this feels like a bug in 10.2.

Is this a “known issue” and will it be fixed in a patch or later release?

Thanks in advance,

Jeremiah

1 Like

Good to know that it solved your problem.

Yes, this is a known issue in 10.2. We have fixed the problem and improved related error messages in upcoming release.

I would say this not a issue but a change and is documented in the CUDA 10.2 profiler document “What’s New” section - https://docs.nvidia.com/cuda/profiler-users-guide/index.html#whats-new

Pasting below from the document:
Starting with CUDA 10.2, Visual Profiler and nvprof use dynamic/shared CUPTI library. Thus it’s required to set the path to the CUPTI library before launching Visual Profiler and nvprof. CUPTI library can be found at /usr/local//extras/CUPTI/lib64 or /usr/local//targets//lib for POSIX platforms and “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA<cuda-toolkit>\extras\CUPTI\lib64” for Windows.

Hi Ramesh,

thank you, now it works! But still there are some issues. When I click on menu item “Help” and then “Help content”, the following message appears:

HTTP ERROR: 500
Problem accessing /help/index.jsp. Reason: 
    Server Error
Powered by Jetty://

Such message appears every time I click on “More…” hyperlink in any analysis report.

I was able to resolve my issue in loading nvvp using this solution. However I have the same problem with the http error when I click on more that etvixi described. is there a solution for this?

I have the exact same problem where such a server error message appears whenever I click on “More…”.
Quite frustrating to be honest.

I have profiled nvvp application with ProcMon

Appending dynamic library cupti64_2020.1.0.dll at
“C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\libnvvp”

helps.

And appending this dynamic library into <CUDA_DIR>/extras/CUPTI/lib64 does not help at all.