I have a complex setup with multiple processes using the GPU on a single machine. How can I get a CUDA trace for all of them using Nsight Systems?
I checked Nsight Systems GUI, but I see no options for this. If I just run it with one of my processes, I do see the other ones in the timeline, but no CUDA trace for them.
I also tried Nsight System CLI with nsys start and nsys stop, but the output file does not contain any CUDA trace.
The easy way to get multiple processes under one Nsys run is to either set process-tree (if they are all launched from the same base process) or profile a script that launches all of them.
1. The CLI commands that I used are: nsys start nsys stop
The processes that are using CUDA are active in between the above commands. However, the output nsys file doesnât contain any CUDA trace. Am I missing something or this doesnât work in Windows?
2. Creating a script that launches all the process is an option, but very complicated to do in our case, thatâs why I was looking for alternatives. Assuming we do it, do you know if the process-tree is available only from the CLI or also from the GUI? I searched for it in the GUI, but couldnât find it.
But in general, you should either use the âfire and forgetâ command with ânsys profileâ or you will want to use the interactive commands start, launch, and stop. Unless you use all three, either the application or the profiler does not start.
If you are just trying to control what part of the application is traced, I would recommend using a delay command or a duration command and the ânsys profileâ command to run your session.
Thanks for your reply. I did some more tests and itâs clear now that all 3 interactive commands are needed (start, launch, stop). My initial understanding of the documentation was wrong.
So, I guess our only option for profiling all the processes is to create a script that launches all of them.