Nsight in holohub not working

Hi,
I work on a project using holoscan and I use the container holohub. I tried to test performances with nsight using the argument --nsys_profile but at start, the error ./run: line 85: nsys: command not found appears. I tried to install it using a Dockerfile and now, depending of the version I get eather
WARNING: Device-side CUDA Event completion trace is currently enabled. This may increase runtime overhead and the likelihood of false dependencies across CUDA Streams. If you wish to avoid this, please disable the feature with --cuda-event-trace=false. nvmlSystemGetDriverVersion failed: Not Supported
with the NVIDIA Nsight Systems version 2025.1.1.131-251135540420v0 or Failed to probe the process (sync). Timeout: 75 sec with the NVIDIA Nsight Systems version 2024.6.1.90-246134905481v0.

Hope you can help me.

Ignore the warning about device side CUDA event completion trace. That was intended to explain behavior in a rare corner case and as of the next version, we are pulling it.

@mhallock I’m not familiar with what holohub is shipping, can you please help.

Hi @valentin.massebeuf,

What is the base container you are using for holoscan? What I believe is the current default base holoscan container should already contain nsys. The nsys version that is included is 2024.4.2.

Can you see if you are able to reproduce the issue with a fresh dev container and any of the sample applications?

As mentioned, you can ignore the event completion trace part of the message, but the “nvmlSystemGetDriverVersion failed: Not Supported” part gives me pause. What is the cuda driver version on your system?

Finally, the error you got with 2025.1.1 could be caused by a couple different things. Lets just start from something simple, from within the dev container, can you run nsys profile sleep 1 without error?

I use the holohub container from the github repository which I then build using sudo ./dev_container build and launch with sudo ./dev_container launch --nsys_profile. When I try to run nsys --version I get the error bash: nsys: command not found.

That’s why I tried to install it during the build of the container, with the argument --docker_file <my_dockerfile_path>/Dockerfile, and my Dockerfile which is a copy of the base Dockerfile with this added:

ARG NSYS_URL=https://developer.download.nvidia.com/devtools/nsight-systems/
ARG NSYS_PKG=nsight-systems-2025.1.1_2025.1.1.131-1_arm64.deb

RUN apt-get update && apt install --no-install-recommends -y \
    libxcb-xinerama0 \
    libxcb-cursor0 \
    libnss3 \
    libxcomposite1 \
    libxdamage1 \
    libxtst6 \
    && rm -rf /var/lib/apt/lists/*

RUN wget ${NSYS_URL}${NSYS_PKG} && dpkg -i $NSYS_PKG && rm $NSYS_PKG

that I tried based on this post and the errors that followed.

This solution seems to work since I can run nsys --version but the errors I talked about in my first post still appear, even with the command nsys profile sleep 1.

Here is the cuda driver version I have:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:25_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Thanks for the extra info! A couple extra bits of info that can be helpful to get a better picture of your setup.

Can you provide the output from ./dev_container launch --verbose and also the run command with the verbose flag added? You are welcome to DM it to me if you’d prefer to not post publicly if there is any of the output that seems sensitive. Specifically I’d like to see the Holoscan SDK version and how its constructing the nsys call.

Here is the output of ./dev_container launch --verbose

sudo ./dev_container launch --verbose
2025-03-20 09:00:03 $ xhost +local:docker
non-network local connections being added to access control list
Launch (HOLOHUB_ROOT: /home/massebeuf/holoscan/holohub)...
Launch (gpu_type: igpu)...
Launch (mount_device_opt:  --device=/dev/video0  --device=/dev/capture-vi-channel71  --device=/dev/capture-vi-channel70  --device=/dev/capture-vi-channel69  --device=/dev/capture-vi-channel68  --device=/dev/capture-vi-channel67  --device=/dev/capture-vi-channel66  --device=/dev/capture-vi-channel65  --device=/dev/capture-vi-channel64  --device=/dev/capture-vi-channel63  --device=/dev/capture-vi-channel62  --device=/dev/capture-vi-channel61  --device=/dev/capture-vi-channel60  --device=/dev/capture-vi-channel59  --device=/dev/capture-vi-channel58  --device=/dev/capture-vi-channel57  --device=/dev/capture-vi-channel56  --device=/dev/capture-vi-channel55  --device=/dev/capture-vi-channel54  --device=/dev/capture-vi-channel53  --device=/dev/capture-vi-channel52  --device=/dev/capture-vi-channel51  --device=/dev/capture-vi-channel50  --device=/dev/capture-vi-channel49  --device=/dev/capture-vi-channel48  --device=/dev/capture-vi-channel47  --device=/dev/capture-vi-channel46  --device=/dev/capture-vi-channel45  --device=/dev/capture-vi-channel44  --device=/dev/capture-vi-channel43  --device=/dev/capture-vi-channel42  --device=/dev/capture-vi-channel41  --device=/dev/capture-vi-channel40  --device=/dev/capture-vi-channel39  --device=/dev/capture-vi-channel38  --device=/dev/capture-vi-channel37  --device=/dev/capture-vi-channel36  --device=/dev/capture-vi-channel35  --device=/dev/capture-vi-channel34  --device=/dev/capture-vi-channel33  --device=/dev/capture-vi-channel32  --device=/dev/capture-vi-channel31  --device=/dev/capture-vi-channel30  --device=/dev/capture-vi-channel29  --device=/dev/capture-vi-channel28  --device=/dev/capture-vi-channel27  --device=/dev/capture-vi-channel26  --device=/dev/capture-vi-channel25  --device=/dev/capture-vi-channel24  --device=/dev/capture-vi-channel23  --device=/dev/capture-vi-channel22  --device=/dev/capture-vi-channel21  --device=/dev/capture-vi-channel20  --device=/dev/capture-vi-channel19  --device=/dev/capture-vi-channel18  --device=/dev/capture-vi-channel17  --device=/dev/capture-vi-channel16  --device=/dev/capture-vi-channel15  --device=/dev/capture-vi-channel14  --device=/dev/capture-vi-channel13  --device=/dev/capture-vi-channel12  --device=/dev/capture-vi-channel11  --device=/dev/capture-vi-channel10  --device=/dev/capture-vi-channel9  --device=/dev/capture-vi-channel8  --device=/dev/capture-vi-channel7  --device=/dev/capture-vi-channel6  --device=/dev/capture-vi-channel5  --device=/dev/capture-vi-channel4  --device=/dev/capture-vi-channel3  --device=/dev/capture-vi-channel2  --device=/dev/capture-vi-channel1  --device=/dev/capture-vi-channel0  --device=/dev/snd/controlC2  --device=/dev/snd/pcmC2D0c  --device=/dev/snd/controlC1  --device=/dev/snd/pcmC1D19c  --device=/dev/snd/pcmC1D19p  --device=/dev/snd/pcmC1D18c  --device=/dev/snd/pcmC1D18p  --device=/dev/snd/pcmC1D17c  --device=/dev/snd/pcmC1D17p  --device=/dev/snd/pcmC1D16c  --device=/dev/snd/pcmC1D16p  --device=/dev/snd/pcmC1D15c  --device=/dev/snd/pcmC1D15p  --device=/dev/snd/pcmC1D14c  --device=/dev/snd/pcmC1D14p  --device=/dev/snd/pcmC1D13c  --device=/dev/snd/pcmC1D13p  --device=/dev/snd/pcmC1D12c  --device=/dev/snd/pcmC1D12p  --device=/dev/snd/pcmC1D11c  --device=/dev/snd/pcmC1D11p  --device=/dev/snd/pcmC1D10c  --device=/dev/snd/pcmC1D10p  --device=/dev/snd/pcmC1D9c  --device=/dev/snd/pcmC1D9p  --device=/dev/snd/pcmC1D8c  --device=/dev/snd/pcmC1D8p  --device=/dev/snd/pcmC1D7c  --device=/dev/snd/pcmC1D7p  --device=/dev/snd/pcmC1D6c  --device=/dev/snd/pcmC1D6p  --device=/dev/snd/pcmC1D5c  --device=/dev/snd/pcmC1D5p  --device=/dev/snd/pcmC1D4c  --device=/dev/snd/pcmC1D4p  --device=/dev/snd/pcmC1D3c  --device=/dev/snd/pcmC1D3p  --device=/dev/snd/pcmC1D2c  --device=/dev/snd/pcmC1D2p  --device=/dev/snd/pcmC1D1c  --device=/dev/snd/pcmC1D1p  --device=/dev/snd/pcmC1D0c  --device=/dev/snd/pcmC1D0p  --device=/dev/snd/controlC0  --device=/dev/snd/pcmC0D9p  --device=/dev/snd/pcmC0D8p  --device=/dev/snd/pcmC0D7p  --device=/dev/snd/pcmC0D3p  --device=/dev/snd/timer  --mount=source=/etc/asound.conf,target=/etc/asound.conf,readonly,type=bind  --group-add=29)...
Launch (conditional_opt:  --volume=/opt/yuan/qcap/include:/opt/yuan/qcap/include  --volume=/opt/yuan/qcap/lib:/opt/yuan/qcap/lib  --volume=/usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra  --group-add=44  --group-add=104  --group-add=1002  --rm  --device=/dev/nvgpu/igpu0/nvsched  --device=/dev/nvhost-ctxsw-gpu  --device=/dev/nvhost-nvsched-gpu  --device=/dev/nvhost-sched-gpu  --device=/dev/nvidia0  --device=/dev/nvidia-modeset  -e CUPY_CACHE_DIR=/workspace/holohub/.cupy/kernel_cache)...
Launch (display_server_opt:  -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY)...
Launch (local_sdk_opt:  -e PYTHONPATH=/opt/nvidia/holoscan/python/lib:/workspace/holohub/benchmarks/holoscan_flow_benchmarking)...
Launch (ucx_opt:  --ipc=host  --cap-add=CAP_SYS_PTRACE  --ulimit=memlock=-1  --ulimit=stack=67108864)...
Launch (docker_opts: )...
Launch (image: holohub:ngc-v3.0.0-igpu)...
Launch (trailing args: )...
2025-03-20 09:00:03 $ docker run --net host --interactive --tty -u 0:0 -v /etc/group:/etc/group:ro -v /etc/passwd:/etc/passwd:ro -v /home/massebeuf/holoscan/holohub:/workspace/holohub -w /workspace/holohub --runtime=nvidia --gpus all --cap-add CAP_SYS_PTRACE --ipc=host -v /dev:/dev --device-cgroup-rule c 81:* rmw --device-cgroup-rule c 189:* rmw -e NVIDIA_DRIVER_CAPABILITIES=graphics,video,compute,utility,display -e HOME=/workspace/holohub --device=/dev/video0 --device=/dev/capture-vi-channel71 --device=/dev/capture-vi-channel70 --device=/dev/capture-vi-channel69 --device=/dev/capture-vi-channel68 --device=/dev/capture-vi-channel67 --device=/dev/capture-vi-channel66 --device=/dev/capture-vi-channel65 --device=/dev/capture-vi-channel64 --device=/dev/capture-vi-channel63 --device=/dev/capture-vi-channel62 --device=/dev/capture-vi-channel61 --device=/dev/capture-vi-channel60 --device=/dev/capture-vi-channel59 --device=/dev/capture-vi-channel58 --device=/dev/capture-vi-channel57 --device=/dev/capture-vi-channel56 --device=/dev/capture-vi-channel55 --device=/dev/capture-vi-channel54 --device=/dev/capture-vi-channel53 --device=/dev/capture-vi-channel52 --device=/dev/capture-vi-channel51 --device=/dev/capture-vi-channel50 --device=/dev/capture-vi-channel49 --device=/dev/capture-vi-channel48 --device=/dev/capture-vi-channel47 --device=/dev/capture-vi-channel46 --device=/dev/capture-vi-channel45 --device=/dev/capture-vi-channel44 --device=/dev/capture-vi-channel43 --device=/dev/capture-vi-channel42 --device=/dev/capture-vi-channel41 --device=/dev/capture-vi-channel40 --device=/dev/capture-vi-channel39 --device=/dev/capture-vi-channel38 --device=/dev/capture-vi-channel37 --device=/dev/capture-vi-channel36 --device=/dev/capture-vi-channel35 --device=/dev/capture-vi-channel34 --device=/dev/capture-vi-channel33 --device=/dev/capture-vi-channel32 --device=/dev/capture-vi-channel31 --device=/dev/capture-vi-channel30 --device=/dev/capture-vi-channel29 --device=/dev/capture-vi-channel28 --device=/dev/capture-vi-channel27 --device=/dev/capture-vi-channel26 --device=/dev/capture-vi-channel25 --device=/dev/capture-vi-channel24 --device=/dev/capture-vi-channel23 --device=/dev/capture-vi-channel22 --device=/dev/capture-vi-channel21 --device=/dev/capture-vi-channel20 --device=/dev/capture-vi-channel19 --device=/dev/capture-vi-channel18 --device=/dev/capture-vi-channel17 --device=/dev/capture-vi-channel16 --device=/dev/capture-vi-channel15 --device=/dev/capture-vi-channel14 --device=/dev/capture-vi-channel13 --device=/dev/capture-vi-channel12 --device=/dev/capture-vi-channel11 --device=/dev/capture-vi-channel10 --device=/dev/capture-vi-channel9 --device=/dev/capture-vi-channel8 --device=/dev/capture-vi-channel7 --device=/dev/capture-vi-channel6 --device=/dev/capture-vi-channel5 --device=/dev/capture-vi-channel4 --device=/dev/capture-vi-channel3 --device=/dev/capture-vi-channel2 --device=/dev/capture-vi-channel1 --device=/dev/capture-vi-channel0 --device=/dev/snd/controlC2 --device=/dev/snd/pcmC2D0c --device=/dev/snd/controlC1 --device=/dev/snd/pcmC1D19c --device=/dev/snd/pcmC1D19p --device=/dev/snd/pcmC1D18c --device=/dev/snd/pcmC1D18p --device=/dev/snd/pcmC1D17c --device=/dev/snd/pcmC1D17p --device=/dev/snd/pcmC1D16c --device=/dev/snd/pcmC1D16p --device=/dev/snd/pcmC1D15c --device=/dev/snd/pcmC1D15p --device=/dev/snd/pcmC1D14c --device=/dev/snd/pcmC1D14p --device=/dev/snd/pcmC1D13c --device=/dev/snd/pcmC1D13p --device=/dev/snd/pcmC1D12c --device=/dev/snd/pcmC1D12p --device=/dev/snd/pcmC1D11c --device=/dev/snd/pcmC1D11p --device=/dev/snd/pcmC1D10c --device=/dev/snd/pcmC1D10p --device=/dev/snd/pcmC1D9c --device=/dev/snd/pcmC1D9p --device=/dev/snd/pcmC1D8c --device=/dev/snd/pcmC1D8p --device=/dev/snd/pcmC1D7c --device=/dev/snd/pcmC1D7p --device=/dev/snd/pcmC1D6c --device=/dev/snd/pcmC1D6p --device=/dev/snd/pcmC1D5c --device=/dev/snd/pcmC1D5p --device=/dev/snd/pcmC1D4c --device=/dev/snd/pcmC1D4p --device=/dev/snd/pcmC1D3c --device=/dev/snd/pcmC1D3p --device=/dev/snd/pcmC1D2c --device=/dev/snd/pcmC1D2p --device=/dev/snd/pcmC1D1c --device=/dev/snd/pcmC1D1p --device=/dev/snd/pcmC1D0c --device=/dev/snd/pcmC1D0p --device=/dev/snd/controlC0 --device=/dev/snd/pcmC0D9p --device=/dev/snd/pcmC0D8p --device=/dev/snd/pcmC0D7p --device=/dev/snd/pcmC0D3p --device=/dev/snd/timer --mount=source=/etc/asound.conf,target=/etc/asound.conf,readonly,type=bind --group-add=29 --volume=/opt/yuan/qcap/include:/opt/yuan/qcap/include --volume=/opt/yuan/qcap/lib:/opt/yuan/qcap/lib --volume=/usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra --group-add=44 --group-add=104 --group-add=1002 --rm --device=/dev/nvgpu/igpu0/nvsched --device=/dev/nvhost-ctxsw-gpu --device=/dev/nvhost-nvsched-gpu --device=/dev/nvhost-sched-gpu --device=/dev/nvidia0 --device=/dev/nvidia-modeset -e CUPY_CACHE_DIR=/workspace/holohub/.cupy/kernel_cache -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY -e PYTHONPATH=/opt/nvidia/holoscan/python/lib:/workspace/holohub/benchmarks/holoscan_flow_benchmarking --ipc=host --cap-add=CAP_SYS_PTRACE --ulimit=memlock=-1 --ulimit=stack=67108864 holohub:ngc-v3.0.0-igpu

=========================
== NVIDIA Holoscan SDK ==
=========================

NVIDIA Holoscan SDK Version: 3.0.0
Container image Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Refer to /opt/nvidia/legal for inherited CUDA and TensorRT container licenses and copyrights.

This container includes the Holoscan libraries, GXF extensions, headers, example source
code, and sample datasets, as well as all Holoscan SDK dependencies.

Visit the User Guide to get started with the Holoscan SDK:
 https://docs.nvidia.com/holoscan/sdk-user-guide/getting_started.html

Python, C++, and GXF examples are installed in /opt/nvidia/holoscan/examples alongside their source
code, and run instructions:
 https://github.com/nvidia-holoscan/holoscan-sdk/tree/main/examples#readme.

See the HoloHub repository for a collection of Holoscan operators and applications:
 https://github.com/nvidia-holoscan/holohub

NOTE: CUDA Forward Compatibility mode ENABLED.
  Using CUDA 12.6 driver version 560.28.03 with kernel driver version 540.3.0.
  See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

I have run build then run launch a test app (the test app load a video and test fps with asynchronous parallelism), here are the output:

~# ./run launch my_test --nsys_profile --verbose
Default language for my_test selected: cpp
Run environment: export PYTHONPATH=${PYTHONPATH}:/opt/nvidia/holoscan/lib/cmake/holoscan/../../../python/lib:/workspace/holohub/build/my_test/python/lib:/workspace/holohub && export HOLOHUB_DATA_PATH=/workspace/holohub/data && export HOLOSCAN_INPUT_PATH=/opt/nvidia/holoscan/data
Run workdir: cd /workspace/holohub/build/my_test
Run command: nsys profile --trace=cuda,vulkan,nvtx,osrt /workspace/holohub/build/my_test/applications/my_test/my_test --data /workspace/holohub/data/endoscopy /workspace/holohub/build/my_test/applications/my_test/my_test.yaml
Run command args: 
[command] export PYTHONPATH=${PYTHONPATH}:/opt/nvidia/holoscan/lib/cmake/holoscan/../../../python/lib:/workspace/holohub/build/my_test/python/lib:/workspace/holohub && export HOLOHUB_DATA_PATH=/workspace/holohub/data && export HOLOSCAN_INPUT_PATH=/opt/nvidia/holoscan/data
[command] cd /workspace/holohub/build/my_test
[command] nsys profile --trace=cuda,vulkan,nvtx,osrt /workspace/holohub/build/my_test/applications/my_test/my_test --data /workspace/holohub/data/endoscopy /workspace/holohub/build/my_test/applications/my_test/my_test.yaml
WARNING: Device-side CUDA Event completion trace is currently enabled.
         This may increase runtime overhead and the likelihood of false
         dependencies across CUDA Streams. If you wish to avoid this, please
         disable the feature with --cuda-event-trace=false.
nvmlSystemGetDriverVersion failed: Not Supported
./run: line 85:  3878 Aborted                 (core dumped) nsys profile --trace=cuda,vulkan,nvtx,osrt /workspace/holohub/build/my_test/applications/my_test/my_test --data /workspace/holohub/data/endoscopy /workspace/holohub/build/my_test/applications/my_test/my_test.yaml
[command] export PYTHONPATH=/opt/nvidia/holoscan/python/lib:/workspace/holohub/benchmarks/holoscan_flow_benchmarking && export HOLOHUB_DATA_PATH="" && export HOLOSCAN_INPUT_PATH="/opt/nvidia/holoscan/data"

Thank you again! It looks like this is a Tegra? Which platform exactly are you working on?

Here are my jtop info