First time using Nsight Systems to learn how to profile my code, and I noticed a warning about cuDNN 9 in the Diagnostics Summary - does Nsight Systems not support cuDNN 9?
I should point out that I can see _5x_cudnn_ampere_scudnn_128x64_relu_xregs_large_nn_v1
showing up as kernel in the Timeline and Events View. Does that mean everything is working correctly and I can ignore the warning?
System information:
- Ubuntu 24.04 (Docker)
- Python 3.11.11
- Pytorch 2.5.1 (built from source):
- CUDA 12.6
- cuDNN 9.6.0.74-1
- NVIDIA Nsight Systems 2025.1.1.103-251135427971v0
Full Diagnostics Summary:
Source | Process ID | Time | Description | |
---|---|---|---|---|
Daemon | -00:00.001 | Unable to configure the collection of CPU IP/backtrace samples, context switch data, or event sampling data. Try the ‘nsys status --environment’ command to learn more. | ||
Analysis | 00:00.000 | Profiling has started. | ||
Daemon | 1447 | 00:00.002 | Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_101435/streams/pid_1447_stdout.log and stderr.log for program output | |
Injection | 1447 | 00:00.020 | Common injection library initialized successfully. | |
Analysis | 00:00.025 | No MMAP events were received for process with pid 1447 before attempting to resolve symbol. This might cause symbols to remain unresolved for the process. | ||
Injection | 1447 | 00:00.028 | OS runtime libraries injection initialized successfully. | |
Analysis | 00:00.036 | Scheduling information is absent. The thread activity is deduced based on OS runtime libraries traces. This is inaccurate and does not take into account asynchronous interrupts and exception faults. | ||
Analysis | 1447 | 00:00.036 | No NVTX events collected. Does the process use NVTX? | |
Analysis | 1447 | 00:00.036 | Number of CUDA events collected: 2.084. | |
Analysis | 1447 | 00:00.036 | Number of OS runtime libraries events collected: 4.095. | |
Analysis | 1447 | 00:00.036 | cuDNN profiling might have not been started correctly. | |
Analysis | 1447 | 00:00.036 | No cuDNN events collected. Does the process use cuDNN? | |
Injection | 1447 | 00:00.606 | Tracing cuDNN library version 90.6 is currently not supported.Loading ‘/opt/nvidia/nsight-systems-cli/2025.1.1/target-linux-x64/libToolsInjectionCuDNN64_90.so’ failed: dlopen hook: ‘/opt/nvidia/nsight-systems-cli/2025.1.1/target-linux-x64/libToolsInjectionCuDNN64_90.so’: cannot open shared object file: No such file or directory. | |
Injection | 1447 | 00:00.607 | cuDNN symbols found in /lib/x86_64-linux-gnu/libcudnn_graph.so.9.6.0 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked? | |
Injection | 1447 | 00:01.746 | Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call. See --flush-on-cudaprofilerstop to control this behavior. | |
Injection | 1447 | 00:01.761 | Loaded CUPTI library: /opt/nvidia/nsight-systems-cli/2025.1.1/target-linux-x64/libcupti.so.12.8 | |
Injection | 1447 | 00:02.071 | Enabling trace for device graph launch | |
Injection | 1447 | 00:02.080 | CUDA injection initialized successfully. | |
Injection | 1447 | 00:02.298 | cuDNN symbols found in /lib/x86_64-linux-gnu/libcudnn_graph.so.9.6.0 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked? | |
Injection | 1447 | 00:02.299 | cuDNN symbols found in /lib/x86_64-linux-gnu/libcudnn_graph.so.9.6.0 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked? | |
Injection | 1447 | 00:02.304 | cuDNN symbols found in /lib/x86_64-linux-gnu/libcudnn_graph.so.9.6.0 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked? | |
Injection | 1447 | 00:03.154 | Number of CUPTI events produced: 2.161, CUPTI buffers: 50. | |
Analysis | 00:03.498 | Profiling has stopped. |
nsys command:
nsys profile -t cuda,nvtx,osrt,cudnn,cublas -x true -o cudnn_test --cuda-event-trace=false python cudnn.py
cudnn.py
import torch
input_tensor = torch.randn(64, 3, 224, 224).cuda()
for _ in range(100):
_ = torch.nn.functional.conv2d(input_tensor, torch.randn(64, 3, 3, 3).cuda())
(Also I tried to tag this topic with cudnn
, but it wouldn’t let me)