DLProf not working due to older version of nsys-cli

When I try to install DLprof with latest version of pip then it installs old version of nsys.

pip --version
pip 25.0.1 from /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/pip (python 3.10)

⚡ ~ pip install nvidia-dlprof
Looking in indexes:
Collecting nvidia-dlprof
Downloading nvidia_dlprof-1.8.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 14.2 MB/s eta 0:00:00
WARNING: Ignoring version 1.8.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from
nvidia_dlprof-1.8.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.7.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 82.8 MB/s eta 0:00:00
WARNING: Ignoring version 1.7.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from
nvidia_dlprof-1.7.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading
nvidia_dlprof-1.6.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 27.4 MB/s eta 0:00:00
WARNING: Ignoring version 1.6.0 of nvidia-dlprof since it has invalid metadata:
Requested
nvidia_dlprof-1.6.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading
nvidia_dlprof-1.5.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 10.1 MB/s eta 0:00:00
WARNING: Ignoring version 1.5.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from
nvidia_dlprof-1.5.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading
nvidia_dlprof-1.4.0-py3-none-any.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 1.5 MB/s eta 0:00:00
WARNING: Ignoring version 1.4.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.4.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading
nvidia_dlprof-1.3.0-py3-none-any.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 6.2 MB/s eta 0:00:00
WARNING: Ignoring version 1.3.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from
nvidia_dlprof-1.3.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading
nvidia_dlprof-1.2.0-py3-none-any.whl (951 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 951.5/951.5 kB 12.5 MB/s eta 0:00:00
WARNING: Ignoring version 1.2.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from
nvidia_dlprof-1.2.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading
nvidia_dlprof-1.1.0-py3-none-any.whl (5.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 32.9 MB/s eta 0:00:00
WARNING: Ignoring version 1.1.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from
nvidia_dlprof-1.1.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.0.0-py3-none-any.whl (5.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.9/5.9 MB 28.2 MB/s eta 0:00:00
WARNING: Ignoring version 1.0.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.0.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-0.19.0-py3-none-any.whl (36.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.6/36.6 MB 67.9 MB/s eta 0:00:00
WARNING: Ignoring version 0.19.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-0.19.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-0.18.0-py3-none-any.whl (36.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.0/36.0 MB 58.2 MB/s eta 0:00:00
WARNING: Ignoring version 0.18.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-0.18.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-0.17.0-py3-none-any.whl (26.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.5/26.5 MB 46.9 MB/s eta 0:00:00
Downloading nvidia_dlprof-1.8.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 586.6 MB/s eta 0:00:00
WARNING: Ignoring version 1.8.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.8.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.7.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 541.0 MB/s eta 0:00:00
WARNING: Ignoring version 1.7.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.7.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.6.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 510.4 MB/s eta 0:00:00
WARNING: Ignoring version 1.6.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.6.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.5.0-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 563.3 MB/s eta 0:00:00
WARNING: Ignoring version 1.5.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.5.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.4.0-py3-none-any.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 505.6 MB/s eta 0:00:00
WARNING: Ignoring version 1.4.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.4.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.3.0-py3-none-any.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 560.1 MB/s eta 0:00:00
WARNING: Ignoring version 1.3.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.3.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.2.0-py3-none-any.whl (951 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 951.5/951.5 kB 479.1 MB/s eta 0:00:00
WARNING: Ignoring version 1.2.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.2.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.1.0-py3-none-any.whl (5.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 456.8 MB/s eta 0:00:00
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.1.0-py3-none-any.whl (5.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 456.8 MB/s eta 0:00:00
WARNING: Ignoring version 1.1.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.1.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-1.0.0-py3-none-any.whl (5.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.9/5.9 MB 420.5 MB/s eta 0:00:00
WARNING: Ignoring version 1.0.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-1.0.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-0.19.0-py3-none-any.whl (36.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.6/36.6 MB 349.5 MB/s eta 0:00:00
WARNING: Ignoring version 0.19.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-0.19.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Downloading nvidia_dlprof-0.18.0-py3-none-any.whl (36.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.0/36.0 MB 355.8 MB/s eta 0:00:00
WARNING: Ignoring version 0.18.0 of nvidia-dlprof since it has invalid metadata:
Requested nvidia-dlprof from nvidia_dlprof-0.18.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with == or != operators
nvidia-tensorboard (<2.,>=1.15.) ; extra == ‘all’
~~~^
Please use pip<24.1 if you need to use this version.
Collecting nvidia-nsys-cli>=2020.4.1.117 (from nvidia-dlprof)
Downloading developer.download.nvidia.com/compute/redist/nvidia-nsys-cli/nvidia_nsys_cli-2021.3.2.12-py3-none-linux_x86_64.whl (76.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.1/76.1 MB 81.4 MB/s eta 0:00:00
Installing collected packages: nvidia-nsys-cli, nvidia-dlprof
Successfully installed nvidia-dlprof-0.17.0 nvidia-nsys-cli-2021.3.2.12
⚡ ~ nsys --version
NVIDIA Nsight Systems version 2021.3.2.12-9700a21

with this, there seems to be installing older version of nsys.
When i run dlprof with this version mismatch then I get errors.
Processing events…
Saving temporary “/tmp/nsys-report-07de-3619-1edf-52d1.qdstrm” file to disk…

Creating final output files…
Processing [===============================================================100%]

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: “Local (CLI)”
}
}
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: “/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_implQuadDCommon::InvalidArgumentException\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown driver API function index: 673\n”
}
}
}
}
Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: “Local (CLI)”
}
}
Error {
Type: RuntimeError
SubError {
Type: ProcessEventsError
Props {
Items {
Type: ErrorText
Value: “/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::exception_detail::clone_implQuadDCommon::InvalidArgumentException\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown driver API function index: 677\n”
}
}
}
}

To resolve this, I have to manually upgrade nsys.

  1. wget https://developer.nvidia.com/downloads/assets/tools/secure/nsight-systems/2025_2/nsight-systems-2025.2.1_2025.2.1.130-1_amd64.deb

  2. sudo dpkg -i nsight-systems-2025.2.1_2025.2.1.130-1_amd64.deb

  3. sudo apt --fix-broken install

  4. pip install --upgrade nvidia-nsys-cli.

  5. For me , it was still showing old version of nsys. So i had to uninstall nsys.
    pip uninstall nvidia-nsys-cli
    It then started showing correct version of nsys

nsys --version
NVIDIA Nsight Systems version 2025.2.1.130-252135690618v0

Then the script started working.

DLProf was end-of-lifed back in 2022. It used Nsight Systems under the covers for all of it’s data collection. So when you install (the now quite dated) DLProf, it is installing a very old Nsight Systems.

If Nsight Systems is covering what you need, you would be better off just getting it directly. If Nsight Systems is not giving you what you need, please let me know what is missing and I can try to get that work prioritized.

I am interested in finding out GPU utilization % and MFU of my distributed training job across multiple GPUs. DLprof gives % GPU utilization. How does nsys give me MFU and GPU utilization?