Trouble using NSight Systems - Condition variables and threads

Hi everyone,

I am stuck trying to use NSight Systems to profile my application. When I run nvidia-smi, I have the following versions:
Driver Version: 525.60.13 CUDA Version: 12.0
My GPU is a RTX A4000 and I run on Ubuntu 22.04.
I tried profiling CUDA 12 .0 samples and it works well (vectorAdd). Nsight Compute also works well on my application. I tried using both Nsight Systems 2022.4.2 (included in the CUDA 12.0 Toolkit) and Nsight Systems 2023.2.1. I also tried profiling using CLI.
My application is C++/CUDA based and uses 3 threads. For some reason, when using NSys, one of the threads appear to get stuck. How can I debug this and how come NSys changes how the application behaves? Is there a parameter I am missing?

These threads are producers/consumers, communicating objects through a FIFO, with condition variables to notify new elements, space available.

EDIT: Diminishing the size of the input data and only collecting cuda traces using nsys profile --trace=cuda made it to work.


@afroger is this likely to be an OSRT issue?

Definitely looks like an OSRT bug.

@baptiste.marty Is it reproducible with nsys profile --trace osrt <program>?
Could you by chance provide a small reproducer?