I am currently using Nsight Compute CLI to profile DNN training. I use the following command to launch and attach the profiler with the program. (I will abbreviate the metrics part, since it is not the core concern of this topic.) If I run this script with appropriate command line arguments, the profiler and the program runs fine and the log and report is created.
#!/bin/bash path_to_report=$1 path_to_script=$2 /usr/local/NVIDIA-Nsight-Compute/ncu \ --log-file ./temp-report \ -o $path_to_report \ --print-summary per-gpu\ --target-processes all \ --metrics some_metrics_blah_blah \ --force \ $path_to_script
However, I want to exit the profiler (or the profiling process) after a certain time (let’s say 5 minutes), but keep the program running. This is because the profiler adds large overhead & info, so I want to stop profiling but keep the DNN training running. But as I tried to terminate the ncu related process with the
kill command, not only the ncu related process but also the training program exits. So I am needing help on this issue. Would there be any way to achieve my objective?