Nsys hanging on slurm cluster

I’m attempting to use Nsight Systems on our Slurm cluster, however it appears to hang after the execution while writing the profile file using 2022.3 and later

I’m running the following:

srun -n 1 -t 00:01:00 --pty nsys profile echo "hello"

This works on 2022.2.1, but on 2022.3.4 and all later versions, it prints

hello
Generating '/tmp/nsys-report-2cc7.qdstrm'

and then hangs.

Modifying the TMPDIR directory (as suggested in Troubleshooting) does not appear to help.

Any suggestions for things I might try?

1 Like

@ztasoulas, I think we have seen this before. Can you give an answer?

1 Like

@simonbyrne1 Could you please try running with --sample=none and --cpuctxsw=none and see if that doesn’t result in a hang?

srun -n 1 -t 00:01:00 --pty nsys profile --sample=none --cpuctxsw=none echo "hello"

Yes, that works correctly.

However if I drop either of the flags, it does hang.

Ok, thanks for confirming!
This is a known bug that we have fixed internally. The fix will be available in an upcoming release.

The workaround for the time being is to set both aforementioned flags to none, that will lead though to CPU backtrace and scheduling activity not being collected.

Good to know, thank you!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.