Detaching ncu profiler while leaving the profiled program running

Hello.

I am currently using Nsight Compute CLI to profile DNN training. I use the following command to launch and attach the profiler with the program. (I will abbreviate the metrics part, since it is not the core concern of this topic.) If I run this script with appropriate command line arguments, the profiler and the program runs fine and the log and report is created.

#!/bin/bash

path_to_report=$1
path_to_script=$2

/usr/local/NVIDIA-Nsight-Compute/ncu \
	--log-file ./temp-report \
	-o $path_to_report \
	--print-summary per-gpu\
	--target-processes all \
	--metrics some_metrics_blah_blah \
	--force \
	$path_to_script

However, I want to exit the profiler (or the profiling process) after a certain time (let’s say 5 minutes), but keep the program running. This is because the profiler adds large overhead & info, so I want to stop profiling but keep the DNN training running. But as I tried to terminate the ncu related process with the kill command, not only the ncu related process but also the training program exits. So I am needing help on this issue. Would there be any way to achieve my objective?

One option would be to limit the number of kernels that are profiled and then let the application continue. I’m not sure if this will remove all the overhead, but it’s easy to try. Adding the flags "–launch-count 7 --kill no " will stop profiling after 7 kernels are profiled and let the application continue. I’m not sure what number you want instead of “7” but could you try that and let me know if it’s behaving in an acceptable way?

Thanks for the reply, @jmarusarz.

For now, I am wondering if there is a way to provide ‘duration’ option or somehow to profile the program for a given time limit. The appropriate ‘launch-count’ can not be predefined before actually running the program, so I want to profile all DNNs for a fixed time interval.

As of now, that functionality does not exist in Nsight Compute

Thanks @jmarusarz

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.