For unknown reasons, when using nsight-compute from my macbook to remote profile a linux machine, it’s always stuck at the following step
Checking file deployment: libInterceptorInjectionTarget.so
Checking file deployment: libTreeLauncherPlaceholder.so
Checking file deployment: libTreeLauncherTargetInjection.so
Checking file deployment: libTreeLauncherTargetUpdatePreloadInjection.so
Checking file deployment: TreeLauncherTargetLdPreloadHelper
Checking file deployment: libcuda-injection.so
Checking file deployment: libInterceptorInjectionTarget.so
Checking file deployment: libnvperf_host.so
Checking file deployment: libnvperf_target.so
Checking file deployment: libTreeLauncherPlaceholder.so
Checking file deployment: libTreeLauncherTargetInjection.so
Checking file deployment: libTreeLauncherTargetUpdatePreloadInjection.so
Checking file deployment: ncu
Checking file deployment: TreeLauncherSubreaper
Checking file deployment: TreeLauncherTargetLdPreloadHelper
Checking file deployment: C2CLink.section
Checking file deployment: ComputeWorkloadAnalysis.section
Checking file deployment: InstructionStatistics.section
Checking file deployment: LaunchStatistics.section
Checking file deployment: MemoryWorkloadAnalysis.section
Checking file deployment: MemoryWorkloadAnalysis_Chart.section
Checking file deployment: MemoryWorkloadAnalysis_Tables.section
Checking file deployment: NumaAffinity.section
Checking file deployment: Nvlink.section
Checking file deployment: Nvlink_Tables.section
Checking file deployment: Nvlink_Topology.section
Checking file deployment: Occupancy.section
Checking file deployment: SchedulerStatistics.section
Checking file deployment: SourceCounters.section
Checking file deployment: SpeedOfLight.section
Checking file deployment: SpeedOfLight_HierarchicalDoubleRooflineChart.section
Checking file deployment: SpeedOfLight_HierarchicalHalfRooflineChart.section
Checking file deployment: SpeedOfLight_HierarchicalSingleRooflineChart.section
Checking file deployment: SpeedOfLight_HierarchicalTensorRooflineChart.section
Checking file deployment: SpeedOfLight_RooflineChart.section
Checking file deployment: WarpStateStatistics.section
Checking file deployment: CPIStall.py
Checking file deployment: FPInstructions.py
Checking file deployment: HighPipeUtilization.py
Checking file deployment: IssueSlotUtilization.py
Checking file deployment: LaunchStatistics.py
Checking file deployment: MemoryApertureUsage.py
Checking file deployment: MemoryCacheAccessPattern.py
Checking file deployment: MemoryL2Compression.py
Checking file deployment: NvRules.py
Checking file deployment: Occupancy.py
Checking file deployment: PCSamplingData.py
Checking file deployment: SharedMemoryConflicts.py
Checking file deployment: SlowPipeLimiter.py
Checking file deployment: SpeedOfLight.py
Checking file deployment: SpeedOfLight_Roofline.py
Checking file deployment: ThreadDivergence.py
Checking file deployment: UncoalescedAccess.chart
Checking file deployment: UncoalescedAccess.py
Checking file deployment: UncoalescedSharedAccess.chart
Checking file deployment: UncoalescedSharedAccess.py
Started SSH reverse proxy on port: 50152
Launching: /home/sean/tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu
Process launched
==PROF== Attempting to connect to ncu-ui at 127.0.0.1:50152...
==PROF== Connected to ncu-ui at 127.0.0.1:50152.
I tried to run the command locally, it gets stuck as well
~$ ~/tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu --config-file off --export ~/tmp/var/query_state --force-overwrite --target-processes all --kernel-name regex:.*wd.* --section-folder ~/tmp/var/sections --section C2CLink --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section NumaAffinity --section Nvlink --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats python
No outputs nor errors whatsoever, until I hit ctrl-C and it errors out
==WARNING== No kernels were profiled.
I’m using 2023.2.2.0 version of ncu because that’s the latest one compatible with CUDA Toolkit 12.2
~/tmp/var/target/linux-desktop-glibc_2_11_3-x64/ncu --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2018-2023 NVIDIA Corporation
Version 2023.2.2.0 (build 33188574) (public-release)
~$ nvidia-smi
Wed Feb 5 00:48:00 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2 |
Interestingly, if I use the ncu that comes with CUDA toolkit, I was able to proceed
/usr/local/cuda/bin/ncu --config-file off --export ~/tmp/var/query_state --force-overwrite --target-processes all --kernel-name regex:.*wd.* --section-folder ~/tmp/var/sections --section C2CLink --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Tables --section NumaAffinity --section Nvlink --section Nvlink_Tables --section Nvlink_Topology --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_HierarchicalDoubleRooflineChart --section SpeedOfLight_HierarchicalHalfRooflineChart --section SpeedOfLight_HierarchicalSingleRooflineChart --section SpeedOfLight_HierarchicalTensorRooflineChart --section SpeedOfLight_RooflineChart --section WarpStateStats python
Python 3.11.4 (main, Dec 7 2023, 15:43:41) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
This happens consistently across 2 machines so I think there must be something I missed?