I am brand new to this, and I was trying out the pytorch DCGan Tutorial here: DCGAN Tutorial — PyTorch Tutorials 1.12.1+cu102 documentation. Based upon the task manager, it seemed to be cpu bound and hardly use any GPU at all. So I wanted to see what NVidia’s profiler says. It normally takes 5-10 minutes to run this tutorial, which I launch the python application from the terminal. So to profile it, I launched:
The program appears to run as normal, but when it gets to the training portion, it slows to a crawl, never getting beyond the first hundred images, and the compute exe taking almost an entire core of cpu. Is this expected behavior? I would have thought that the profiler would add 1-10% overhead, not essentially freeze the application.
Nsight Compute collects HW and SW performance metrics for CUDA kernels. Depending on which metrics are selected, kernels may need to be replayed multiple times since not all metrics can be collected in a single pass. This requires that the tool save and restore state such as allocated memory for deterministic results, or limit the GPU clock frequency for stable measurements.
You tool you might be looking for is NVIDIA Nsight Systems | NVIDIA Developer which collects a trace of all CPU and GPU activities with low overhead to give you a realistic timeline view of the application. If from that tool you determine that certain CUDA kernels need optimization, you can selectively profile them with Nsight Compute for further performance analysis.
Thanks! I can’t seem to find the CLI for Nsight Systems for Windows devices, does one exist? Am I correct in thinking that Nsight Systems needs to start the program itself? From the GUI it seems like you can’t start profiling in the middle of an application, you have to have the Gui start the program itself.
There is no command line interface for Windows, yet. You can check the “Features” table on NVIDIA Nsight Systems | NVIDIA Developer to see the exact differences between the platforms.
Yes, you need to launch the program from within Nsight Systems (similar to how you previously started it from within Nsight Compute). Nsight Systems allows you to start without profiling and enable that at a later point, if you want to skip some initial part of the execution, but it needs to be launched from within the tool in any case.
Okay, so I managed to sort of profile my application. The problem is that there was no GPU samples! So I printed out print("device: " + (“cuda” if torch.cuda.is_available() else “cpu”) + “\n”)
and found that my program was running using the cpu when under nsight systems, and cuda when not. My guess was that pytorch was using cuda 10.1 and nsight was using cuda 10.2. So I installed the older version of nsight, and now my program uses CUDA while under nsight. However, I get the following warnings in the diagnostics summary:
Injection DESKTOP-41SHKD0 (:1:0:1964) 00:01.431
Unsupported CUDA driver version: 10020.
Warning Injection DESKTOP-41SHKD0 (:1:0:1964) 00:01.431
CUDA injection initialization failed.
Warning Analysis DESKTOP-41SHKD0 (:1:0:1964) 00:47.179
CUDA profiling might have not been started correctly.
Warning Analysis DESKTOP-41SHKD0 (:1:0:1964) 00:47.179
Zero CUDA events were collected. Does the application use CUDA?
And again no GPU traces. I uninstalled all CUDA 10.2 apis, so my current thinking is that I need to uninstall the graphics driver 441.22 and install an older one:
C:\Users\kulcy>C:\Windows\System32\nvidia-smi.exe
Sun Mar 29 18:59:04 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 441.22 Driver Version: 441.22 CUDA Version: 10.2 |
Per the GUI-about, I am using Version: 2018.3.3.31-00ef8f4 Windows-x64.
This is the older version that I installed in an attempt to use 10.1 to match pytorch. I don’t have the new version installed anymore, but it was the latest available on the nvidia webset as of about a week ago.
Nsys does not appear to be on the command line, I though NSight systems didn’t have a CLI for Windows?
C:\Users\kulcy>nsys -version
‘nsys’ is not recognized as an internal or external command,
operable program or batch file.
I downloaded nsight systems 2020.1, then I started the trace. While it was processing the output, I checked the log and noticed that it had appended to the old output. So I deleted the log, and clicked stop button, and since then I haven’t been able to get it to collect at all. When I click on start, it gives me a pop up like so:
When I press refresh or select on the computer, it just hangs. I have tried restarting the computer, and I also tried uninstalling and reinstalling nsight systems. Same issue.
Nsight Systems 2020.1 should indeed support tracing of a CUDA 10.2 application.
My suggestions are below:
#1: Uninstall Nsight Systems, delete the folder it was installed in, and re-install Nsight Systems.
#2: If the above does not work then could you please capture a log of the Nsight Systems host app (this is slightly different from the log you captured a few days ago):
Make a copy of the file nvlog.config.template file from the host-windows-x64 directory in the same host-windows-x64 directory.
Rename nvlog.config - Copy.template as nvlog.config.
Run a collection. The file nsight-sys.log should be created in the host-windows-x64 directory.
Send me that file.
BTW, there is a new release out - Nsight System 2020.2. You might want to give it a try.
Okay, I installed 2020.2 and uninstalled .1 I can now record traces again, but I am back to the issue that the python program doesn’t think there is a GPU. Attached is the log. nsight-sys.log (54.9 KB)
Interestingly the cmd windows that pops up when I run the program no longer shows the program output, instead just a black screen. This feels like a regression from 2020.1. It is clearly doing something though since the cpu is at 100%. Diagnostics summary has the following:
11432 -00:00.173
Process 11432 was launched by the profiler
Information Analysis 00:00.000
Profiling has started.
Information Daemon 11432 00:00.000
Process 11432 was launched by the profiler
Information Daemon 11432 00:00.000
Profiler attached to the process.
Information Injection 11432 00:01.259
Common injection library initialized successfully.
Information Injection 11432 00:01.304
CUDA injection initialized successfully.
Information Injection 11432 00:39.996
Number of CUPTI events produced: 2, CUPTI buffers: 20.
Information Injection 11432 00:39.997
Number of CUPTI events produced: 4, CUPTI buffers: 20.
Information Analysis 00:42.865
Profiling has stopped.
Information Analysis 11432 02:11.557
Number of CUDA events collected: 2.
Warning Analysis 11432 02:11.557
Vulkan profiling might have not been started correctly.
Warning Analysis 11432 02:11.557
No Vulkan events collected. Does the process use Vulkan?
Nsight Systems 2020.2 (and later) captures the standard output and error streams of the target application. You can see the output to these streams by selecting “Files” from the views drop-list (which shows the Timeline view by default).