We are running a server on the Arm processor that receives jobs from a remote host and runs them on the GPU.
I’d like to profile these jobs with nsight-systems. However, the application - when run - only launches the server and then waits for a job (i.e. nsys profile application does not detect any activity on the GPU). GPU activity must be triggered by the remote host after the application runs on the DPU.
So far, I cannot find any way to profile these jobs using nsight-systems. Any help is greatly appreciated. #NVIDIAInception
The server is running a service process whose responsibility is to listen to some form of IPC or socket waiting for work to be scheduled
The client processes are communicating with the service process running on the server to tell him to execute some work
In such case, you need to run “nsys profile” or “nsys launch” to start the service process running on the server. A lot of Nsight Systems’ feature require some tool libraries to be injected in the processes. The only way to do that is the launch the process you want to profile with the GUI or CLI.
If you don’t want to profile for the whole lifetime of the program, you can always use interactive commands: “nsys launch”, “nsys start”, “nsys stop”, etc.
This is helpful.
When I used “nsys profile” or “nsys launch” the profiler was not capturing GPU activity.
However, with “nsys profile --duration” I’m able to submit a job and execute the work within the duration – and the work is captured by the profiler.
Note that you can use interactive command “nsys start” and “nsys stop” on the server to control the profiling session of the service process.
Launch the service with the profiler — it is not profiled traced yet but you can use nsys profile instead to start profiling immediately:
$ nsys launch --session-new service [...]
Start the profiling session:
$ nsys start --session service
Stop the profiling session — an “.nsys-rep” file will be generated:
$ nsys stop --session service
You can use nsys start and nsys stop multiple times on the same profiling session. We also have capture trigger available through the --capture-range option. If the jobs include calls to “cudaProfilerStart” and “cudaProfilerStop”, you can set --capture-range cudaProfilerApi to automatically start and stop the profiling session.