I’m using nsys-cli and I used all kind of options with numerous trials, but in the end, 1) I have Broken Backtraces as shown in the screenshot, and 2) couldn’t get “pt_main_thread”.
I’m using:
nvcr.io/nvidia/pytorch:24.09-py3
docker run -dit --gpus all --cap-add=SYS_ADMIN nvcr.io/nvidia/pytorch:24.09-py3
/workspace# ==nsys status -e==
Timestamp counter supported: No
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 1
Linux Distribution = Ubuntu
Linux Kernel Version = 6.8.0-52-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
nsys profile -o report.nsys-rep --trace=cuda,nvtx,osrt,cudnn --sample=process-tree --cpuctxsw=process-tree --cuda-graph-trace=graph --backtrace=fp --python-backtrace=cuda python file.py
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 On | 00000000:00:08.0 Off | N/A |
| 0% 46C P8 17W / 170W | 2MiB / 12288MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+
nsys-report-2e68.zip (13.2 MB)
I read all related posts, but couldn’t make it. Any help would be appreciated
log
Messages
Source Process ID Time Description
Information Daemon -00:00.034
Frame pointer backtraces collected.
Information Daemon -00:00.034
Event ‘Reference Cycles’, with sampling period 1000000, used to trigger process-tree CPU IP sample collection.
Information Daemon -00:00.000
1 CPU IP samples collected for every CPU IP backtrace collected.
Information Analysis 00:00.000
Profiling has started.
Information Daemon 1226 00:00.000
Process was launched by the profiler, see /tmp/nvidia/nsight_systems/quadd_session_101214/streams/pid_1226_stdout.log and stderr.log for program output
Information Injection 1226 00:00.030
Common injection library initialized successfully.
Information Injection 1226 00:00.053
OS runtime libraries injection initialized successfully.
Warning Injection 1226 00:00.854
Tracing cuDNN library version 90.1 is currently not supported.Loading ‘/usr/local/cuda-12.6/NsightSystems-cli-2024.4.2/target-linux-x64/libToolsInjectionCuDNN64_90.so’ failed: dlopen hook: ‘/usr/local/cuda-12.6/NsightSystems-cli-2024.4.2/target-linux-x64/libToolsInjectionCuDNN64_90.so’: cannot open shared object file: No such file or directory.
Warning Injection 1226 00:00.854
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_graph.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Warning Injection 1226 00:02.752
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Warning Injection 1226 00:02.800
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Warning Injection 1226 00:05.529
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Information Injection 1328 00:05.570
Common injection library initialized successfully.
Information Injection 1328 00:05.591
OS runtime libraries injection initialized successfully.
Information Injection 1332 00:05.612
Common injection library initialized successfully.
Information Injection 1332 00:05.622
OS runtime libraries injection initialized successfully.
Information Injection 1339 00:06.242
Common injection library initialized successfully.
Information Injection 1338 00:06.242
Common injection library initialized successfully.
Information Injection 1339 00:06.262
OS runtime libraries injection initialized successfully.
Information Injection 1338 00:06.264
OS runtime libraries injection initialized successfully.
Information Injection 1346 00:06.295
Common injection library initialized successfully.
Information Injection 1346 00:06.314
OS runtime libraries injection initialized successfully.
Information Injection 1351 00:06.336
Common injection library initialized successfully.
Information Injection 1351 00:06.344
OS runtime libraries injection initialized successfully.
Information Injection 1353 00:06.345
Common injection library initialized successfully.
Information Injection 1353 00:06.363
OS runtime libraries injection initialized successfully.
Information Injection 1364 00:06.432
Common injection library initialized successfully.
Information Injection 1364 00:06.452
OS runtime libraries injection initialized successfully.
Information Injection 1368 00:06.482
Common injection library initialized successfully.
Information Injection 1368 00:06.501
OS runtime libraries injection initialized successfully.
Information Injection 1373 00:07.646
Common injection library initialized successfully.
Information Injection 1373 00:07.669
OS runtime libraries injection initialized successfully.
Warning Injection 1373 00:08.449
Tracing cuDNN library version 90.1 is currently not supported.Loading ‘/usr/local/cuda-12.6/NsightSystems-cli-2024.4.2/target-linux-x64/libToolsInjectionCuDNN64_90.so’ failed: dlopen hook: ‘/usr/local/cuda-12.6/NsightSystems-cli-2024.4.2/target-linux-x64/libToolsInjectionCuDNN64_90.so’: cannot open shared object file: No such file or directory.
Warning Injection 1373 00:08.449
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_graph.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Warning Injection 1373 00:10.533
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Warning Injection 1373 00:10.581
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Warning Injection 1373 00:13.314
cuDNN symbols found in /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 symbol table. No cuDNN trace will be generated from that library. Was cuDNN statically linked?
Information Injection 1451 00:13.355
Common injection library initialized successfully.
Information Injection 1451 00:13.365
OS runtime libraries injection initialized successfully.
Information Injection 1455 00:13.383
Common injection library initialized successfully.
Information Injection 1455 00:13.391
OS runtime libraries injection initialized successfully.
Information Injection 1461 00:14.004
Common injection library initialized successfully.
Information Injection 1462 00:14.005
Common injection library initialized successfully.
Information Injection 1461 00:14.024
OS runtime libraries injection initialized successfully.
Information Injection 1462 00:14.025
OS runtime libraries injection initialized successfully.
Information Injection 1469 00:14.052
Common injection library initialized successfully.
Information Injection 1469 00:14.071
OS runtime libraries injection initialized successfully.
Information Injection 1476 00:14.094
Common injection library initialized successfully.
Information Injection 1474 00:14.097
Common injection library initialized successfully.
Information Injection 1476 00:14.103
OS runtime libraries injection initialized successfully.
Information Injection 1474 00:14.117
OS runtime libraries injection initialized successfully.
Information Injection 1487 00:14.212
Common injection library initialized successfully.
Information Injection 1487 00:14.232
OS runtime libraries injection initialized successfully.
Information Injection 1491 00:14.264
Common injection library initialized successfully.
Information Injection 1491 00:14.283
OS runtime libraries injection initialized successfully.
Warning Analysis 1226 00:15.789
NVTX function nvtxDomainDestroy was called with wrong domain ID argument.
Warning Analysis 1461 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1564 00:15.789
No NVTX events collected. Does the process use NVTX?
Information Analysis 1226 00:15.789
Number of NVTX events collected: 21.
Warning Analysis 1568 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1353 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1455 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1476 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1339 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1328 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1451 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1462 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1368 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1491 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1487 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1332 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1351 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1474 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1373 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1346 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1469 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1364 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1338 00:15.789
No NVTX events collected. Does the process use NVTX?
Warning Analysis 1461 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1461 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1564 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1564 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1226 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1226 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1568 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1568 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1353 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1353 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1455 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1455 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1476 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1476 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1339 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1339 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1328 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1328 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1451 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1451 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1462 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1462 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1368 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1368 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1491 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1491 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1487 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1487 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1332 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1332 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1351 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1351 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1474 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1474 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1373 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1373 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1346 00:15.789
CUDA profiling might have not been started correctly.
Warning Analysis 1346 00:15.789
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1469 00:15.790
CUDA profiling might have not been started correctly.
Warning Analysis 1469 00:15.790
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1364 00:15.790
CUDA profiling might have not been started correctly.
Warning Analysis 1364 00:15.790
No CUDA events collected. Does the process use CUDA?
Warning Analysis 1338 00:15.790
CUDA profiling might have not been started correctly.
Warning Analysis 1338 00:15.790
No CUDA events collected. Does the process use CUDA?
Information Analysis 1461 00:15.790
Number of OS runtime libraries events collected: 7.
Warning Analysis 1564 00:15.790
No OS runtime libraries events collected. Does the process use OS runtime libraries?
Information Analysis 1226 00:15.790
Number of OS runtime libraries events collected: 95,033.
Warning Analysis 1568 00:15.790
No OS runtime libraries events collected. Does the process use OS runtime libraries?
Information Analysis 1353 00:15.790
Number of OS runtime libraries events collected: 5.
Information Analysis 1455 00:15.790
Number of OS runtime libraries events collected: 3.
Information Analysis 1476 00:15.790
Number of OS runtime libraries events collected: 5.
Information Analysis 1339 00:15.790
Number of OS runtime libraries events collected: 5.
Information Analysis 1328 00:15.790
Number of OS runtime libraries events collected: 101.
Information Analysis 1451 00:15.790
Number of OS runtime libraries events collected: 101.
Information Analysis 1462 00:15.790
Number of OS runtime libraries events collected: 5.
Information Analysis 1368 00:15.790
Number of OS runtime libraries events collected: 4.
Information Analysis 1491 00:15.790
Number of OS runtime libraries events collected: 4.
Information Analysis 1487 00:15.790
Number of OS runtime libraries events collected: 4.
Information Analysis 1332 00:15.790
Number of OS runtime libraries events collected: 3.
Information Analysis 1351 00:15.790
Number of OS runtime libraries events collected: 110.
Information Analysis 1474 00:15.790
Number of OS runtime libraries events collected: 278.
Information Analysis 1373 00:15.790
Number of OS runtime libraries events collected: 88,936.
Warning Analysis 1346 00:15.790
No OS runtime libraries events collected. Does the process use OS runtime libraries?
Warning Analysis 1469 00:15.790
No OS runtime libraries events collected. Does the process use OS runtime libraries?
Information Analysis 1364 00:15.790
Number of OS runtime libraries events collected: 4.
Information Analysis 1338 00:15.790
Number of OS runtime libraries events collected: 7.
Warning Analysis 1461 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1461 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1564 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1564 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1226 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1226 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1568 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1568 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1353 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1353 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1455 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1455 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1476 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1476 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1339 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1339 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1328 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1328 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1451 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1451 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1462 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1462 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1368 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1368 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1491 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1491 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1487 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1487 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1332 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1332 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1351 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1351 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1474 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1474 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1373 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1373 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1346 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1346 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1469 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1469 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1364 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1364 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Warning Analysis 1338 00:15.790
cuDNN profiling might have not been started correctly.
Warning Analysis 1338 00:15.790
No cuDNN events collected. Does the process use cuDNN?
Information Injection 1564 00:26.946
Common injection library initialized successfully.
Information Injection 1564 00:26.966
OS runtime libraries injection initialized successfully.
Information Injection 1568 00:26.997
Common injection library initialized successfully.
Information Injection 1568 00:27.016
OS runtime libraries injection initialized successfully.
Information Injection 1226 00:54.827
NVTX injection initialized successfully.
Information Analysis 00:58.865
Profiling has stopped.
Information Daemon 00:59.934
Number of IP samples collected: 153,511.
Warning Daemon 00:59.934
The operating system throttled the collection of sampling data 58918 times.