Cannot profile RTX 2060 KO (TU104) with CUDA 11.0 on windows and ubuntu

n.b.agostini · July 20, 2020, 1:21am

Hello,

From my readings of the documentations (https://developer.nvidia.com/nsight-compute) NSIGHT Compute and nvprof should be able to produce detailed profiling metrics for any TU1XX chip.
However, it does not work with my RTX 2060.

nvprof can run with “summary” options (just regular tracing).

nvprof.exe 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe'
# or
nvprof.exe -o output.nvvp -f  'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe'

But advanced profiling does not work:


nvprof.exe -o output.nvvp -f --analysis-metrics  'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe'

Producing the following error (Warning) in the output, and does not generate any detailed information about the executed kernels.

Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe'
======== Warning: Skipping profiling on device 0 since profiling is not supported on devices with compute capability 7.5 and higher.
                  Use NVIDIA Nsight Compute for GPU profiling and NVIDIA Nsight Systems for GPU tracing and CPU sampling.
                  Refer https://developer.nvidia.com/tools-overview for more details.

======== Warning: The option --aggregate-mode on has no effect. The --aggregate-mode <on|off> option applies to --events and --metrics options that follow it.
======== Warning: The option --aggregate-mode off has no effect. The --aggregate-mode <on|off> option applies to --events and --metrics options that follow it.
[Vector addition of 50000 elements]
==19508== NVPROF is profiling process 19508, command: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==19508== Generated result file: C:\Users\Agostini\output.nvvp

I have also tried to dual boot with Ubuntu 20.04 and I receive the same error. Furthermore in windows, “MS Visual Studio 2019 > NSIGHT > Start performance analysis…” detects the device but upon profiling execution the following error occurs.

Attempted to perform CUDA trace on an unsupported CUDA device. Serialized kernel trace mode has been used.

I have also tried to use NSIGHT Compute in both windows and ubuntu without success (gives an error but it is not descriptive).

Is the RTX 2060 KO (TU104) supported by CUDA 11.0 tools?
What consumer cards from the Turing generation support detailed profiling?

Thank you in advance

mjain · July 20, 2020, 6:54am

Hi N B Agostini,

Tools nvprof and NVIDIA Visual Profiler don’t support profiling events and metrics on Turing and later GPU architectures. These tools support tracing (timeline) activities on Turing. These limitations are documented in the profiler guide in the section 1. Preparing An Application For Profiling — Profiler 12.3 documentation.

Nsight Compute supports profiling on Turing TU1xx cards. Did you try GUI or CLI? Can you please paste the full error log?
Do you encounter the error message “ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device …”. Profiling tools require users to start the profiling as root user, or have admin to give profiling permission to non-root users.
Related links:

n.b.agostini · July 21, 2020, 6:46am

Thank you for sharing this link and information. I did not come across it under my investigations and it explains a lot.

Following your suggestion, I am currently trying NSIGHT compute on windows.

My setup is:
Profiling

Target platform - Windows
Application  Executable - C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.0/extras/demo_suite/vectorAdd.exe

Activity - Profile
Outputfile - C:/Users/Agostini/nvvp_workspace/nsight/output
Force Overwrite - Yes
Target Process - Application Only
Commnadline (auto-generated) - "C:/Program Files/NVIDIA Corporation/Nsight Compute 2020.1.0/target/windows-desktop-win7-x64/ncu.exe" --export C:/Users/Agostini/nvvp_workspace/nsight/output --force-overwrite --target-processes application-only --kernel-regex-base function --launch-skip-before-match 0 --section LaunchStats --section Occupancy --section SpeedOfLight --sampling-interval auto --sampling-max-passes 5 --sampling-buffer-size 33554432 --profile-from-start 1 --cache-control all --clock-control base --apply-rules yes "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.0/extras/demo_suite/vectorAdd.exe"

The profile attempts to connect to different IP:ports,

==PROF== Attempting to connect to ncu-ui at 10.15.187.74:50160...
==PROF== Connected to ncu-ui at 10.15.187.74:50160.
[Vector addition of 50000 elements]
==PROF== Connected to process 13200 (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==PROF== Profiling "vectorAdd" - 1: 0%..

At this point the screen goes black for 2 seconds and when it re-opens I observe this error:

Launched process: ncu.exe (pid: 1916)
Launch succeeded.
Profiling...
==PROF== Connected to process 15284 (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe)

==PROF== Profiling "vectorAdd" - 1: 
==ERROR== Error: UnknownError

==PROF== Disconnected from process 15284

==ERROR== The application returned an error code (1).

==ERROR== An error occurred while trying to profile.

==PROF== Report: output.ncu-rep

Process terminated.
Loading report file C:/Users/Agostini/nvvp_workspace/nsight/output.ncu-rep...

And the file C:/Users/Agostini/nvvp_workspace/nsight/output.ncu-rep opens, however the Page: Details is incomplete with lots of yellow exclamation marks for all Speed of Light Metrics

Additional notes:

I am profiling the same GPU that is rendering the display to my monitor
I have “allow access to GPU performance counters to all users” on
I have tried to run NSIGHT COMPUTE as Admistrator

Running the command line in CMD produces the same error.

[Vector addition of 50000 elements]
==PROF== Connected to process 5184 (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\extras\demo_suite\vectorAdd.exe)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
==PROF== Profiling "vectorAdd" - 1: 0%....50%....100% - 9 passes

==ERROR== Error: UnknownError
Copy output data from the CUDA device to the host memory
Failed to copy vector C from device to host (error code the launch timed out and was terminated)!
==PROF== Disconnected from process 5184
==ERROR== The application returned an error code (1).
==ERROR== An error occurred while trying to profile.
==PROF== Report: output.ncu-rep

I can possible try the same on ubuntu (without display running) later.

I am unsure what to do now. Let me know what I should try next.

n.b.agostini · July 22, 2020, 1:32am

It appears that it is not possible to profile the GPU which is also rendering image to your monitors (Xorg or Windows UI). Profiling works if gpu is just rendering a virtual terminal (Ctrl+Alt+FX).

I switched to Ubuntu 20.04 an tried NSIGHT-Compute UI with root privileges, but my screen freezes during profiling and computer restarts (at the same spot on which windows flashes a black screen). The same happens if I tried the command line interface in ubuntu.

However, if I switch the a Virtual Terminal (Ctrl+Alt+F3) and execute the ncu with sudo privileges, I am finally able to collect kernel metrics.

# Monitor is receiving Virtual Terminal image from the GPU, but xorg process is idle.
sudo /usr/local/cuda-11/bin/ncu -o report /usr/loca/cuda-11/extras/demo_suite/vectorAdd

n.b.agostini · July 22, 2020, 1:35am

@mjain Is it a bug, or we are not meant to do detailed profiling in the same GPU that renders the display?

Thank you,
Nico

felix_dt · July 24, 2020, 7:12am

You can profile on the same GPU that is driving the display. However, there are more restrictions related with that. First of all, when on Windows, the operating system will forcefully stop any long-running kernel that appears to hang the display, which includes kernels running longer due to profiling overhead. You can check Release Notes :: Nsight Compute Documentation for details.

Enabling certain metrics can cause GPU kernels to run longer than the driver’s watchdog time-out limit. In these cases the driver will terminate the GPU kernel resulting in an application error and profiling data will not be available. Please disable the driver watchdog time out before profiling such long running CUDA kernels.

On Linux, setting the X Config option Interactive to false is recommended.

For Windows, detailed information on disabling the Windows TDR is available at Timeout detection and recovery (TDR) - Windows drivers | Microsoft Learn

In addition, results for certain metrics might differ significantly, due to influence from other contexts on the same GPU that cannot be completely isolated

Profiling a kernel while other contexts are active on the same device (e.g. X server, or secondary CUDA or graphics application) can result in varying metric values for L2/FB (Device Memory) related metrics. Specifically, L2/FB traffic from non-profiled contexts cannot be excluded from the metric results. To completely avoid this issue, profile the application on a GPU without secondary contexts accessing the same device (e.g. no X server on Linux).

n.b.agostini · July 26, 2020, 6:20pm

Thank you for the suggestion @felix_dt .
I followed the windows instructions:

Opened Nsight Monitor with Run as administrator
Clicked on the tray icon
Clicked on “Nsight Monitor options”
In General > Microsoft Display Driver, I changed “WDDM TDR Enabled” to “False”
Reboot the machine to apply changes

Then I opened Nsight Compute and profiled for SOL metrics in a simple kernel. However the screen got stuck and I had to hard reset the PC.

Changing the “WDDM TDR Delay” from 2 to 15 with “WDDM TDR Enabled” to “True”, upon profiling with Nsight Compute makes the screen to get stuck for 15 seconds, then go black for 1-2 seconds, and then resuming the session but failing the profiling.

I am not sure what is wrong. I have used Nsight Compute from CUDA 10.2 and 11.0 with Windows 2004, running with a 2080TI (WDDM running to render remote desktop sessions) without problems. I am unsure why this current system, with a 2060KO, can’t be profiled.

Any additional ideas?

Thank you in advance

EDIT: “added the reboot the machine step”

felix_dt · July 27, 2020, 4:21pm

Did you reboot the machine in-between changing this option?

n.b.agostini · July 27, 2020, 4:51pm

Yes! I did, I will update my message for future reference.

Topic		Replies	Views
Can't Get NCU GUI To Import Properly Nsight Compute	8	1344	October 5, 2020
NVProf error on samples CUDA Programming and Performance	28	20459	December 29, 2020
nvprof: Warning: The user does not have permission to profile on the target device. Visual Profiler and nvprof	20	25573	October 12, 2021
NVIDIA NSight Compute: The profiler returned an error code:1 Nsight Compute	13	1941	March 18, 2024
VS2019 CUDA 10.1 unable to get profile report with Nsight CUDA Setup and Installation	3	2482	January 14, 2020
nvprof: Internal profiling error 4277:5 on Tesla P100, but not on GTX 1070 Visual Profiler and nvprof	12	3982	October 12, 2021
CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler Technical Blog	35	2460	September 5, 2021
No events/metrics were profiled when use nvprof in CUDA 10.1.168 Visual Profiler and nvprof	5	5043	December 14, 2019
unified memory profiling failed Visual Profiler and nvprof	12	6110	June 17, 2018
No GPU devices in Session CUPTI – CUDA Profiler Tools Interface cuda	7	937	August 6, 2020

Cannot profile RTX 2060 KO (TU104) with CUDA 11.0 on windows and ubuntu

Related topics