Hello.
I am currently using Nsight Compute CLI to profile DNN training. I use the following command to launch and attach the profiler with the program. (I will abbreviate the metrics part, since it is not the core concern of this topic.) If I run this script with appropriate command line arguments, the profiler and the program runs fine and the log and report is created.
#!/bin/bash
path_to_report=$1
path_to_script=$2
/usr/local/NVIDIA-Nsight-Compute/ncu \
--log-file ./temp-report \
-o $path_to_report \
--print-summary per-gpu\
--target-processes all \
--metrics some_metrics_blah_blah \
--force \
$path_to_script
However, I want to exit the profiler (or the profiling process) after a certain time (let’s say 5 minutes), but keep the program running. This is because the profiler adds large overhead & info, so I want to stop profiling but keep the DNN training running. But as I tried to terminate the ncu related process with the kill
command, not only the ncu related process but also the training program exits. So I am needing help on this issue. Would there be any way to achieve my objective?