Hi mengda.yang,
The CLI command you have used launches the application and profiles until the app exits.
./nsys profile -o <filename.qdstrm>
You are relying on the default CLI options to profile your application here. By default, the CLI traces CUDA, OpenGL, NVTX, and osrt. Also, the CPU sampling is turned on by default. To see the list of CLI options and the defaults, see the output of ./nsys profile --help
In order to profile for a shorter duration, use the --duration=X switch. This was hwilper’s first suggestion.
In order to trace only CUDA APIs, use the --trace=cuda option. This turns off tracing of APIs from all the other libraries. To turn off CPU sampling, use the --sample=none. This was hwilper’s second suggestion.