Running
nsys profile --trace=cuda,nvtx,osrt -bfp <target> <args>
produces a .qdrep file which I can view in Nsight Systems.
This works fine when the process is using CUDA from two threads, The GPU node shows up and I can view the CUDA kernel profile info.
Running the same experiment using sub-processes instead, yields warnings from Nsight Systems:
CUDA profiling might have not been started correctly.
Warning Analysis *box (:0:1:24482) 00:00.196
CUDA profiling might have not been started correctly.
Warning Analysis *box (:0:1:24482) 00:00.196
Zero CUDA events were collected. Does the application use CUDA?
According to the documentation, the profile should trace subprocesses by default.
I am seeing similar behaviour when trying to profile from Nsight Systems on a host, the CUDA profiling doesn’t seem to take.
Is this expected behaviour?
(This is on Ubuntu 18.04 with CUDA 10.1, Nsight Systems version 2019.3.1)