Starting >5 nsys instances in parallel results in "Agent launcher failed"

When starting multiple nsys instances at the same time (each to profile a separate GPU), the error “Agent launcher failed” is produced when starting more than 5. A workaround is to wait for around 10s before starting an additional instance. My nsys version is NVIDIA Nsight Systems version 2022.4.2.50-32196742v0.
I can trigger the error using the following script: for i in seq 8; do CUDA_VISIBLE_DEVICES=$i nsys launch --session-new=test_$i top & done

Is there a reason why you do not run Nsys on a script that launches all of them instead? That would be the more normal way to run this use case and would get you one report file that you could then examine the different GPUs in?

Thank you for the quick reply! That would normally be a great solution, unfortunately for us wrapping the code that creates instances is not an option because we are launching workers on remote systems too.

Okay, can you try explicitly giving nsys a session identifier instead of using the default and see if that changes anything?

–session-new
[a-Z][0-9,a-Z,spaces]
Launch the application in a new session. Name
must start with an alphabetical character followed by printable or space characters. Any %q{ENV_VAR} pattern will be substituted with the value of the environment variable. Any %h pattern will be substituted with the hostname of the system. Any %% pattern will be substituted with %.

sigh…sorry, neglected to read for content, you are already doing that.

@afroger do you have thoughts here?

Sorry for the delayed reply. I believe this was a known issue that has been fixed in newer version of Nsight Systems. @joosthooz Can you please try with the latest version of Nsight Systems available in the website? I can’t reproduce using the latest version.

Thanks, I’ll try the new version and will create a new question if I’m still having issues.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.