If nsys has an option similar to ‘–profile-all-processes’?(Not getting cuda information from child processes on Linux


I try to use nsys to record a CUDA execution information which produced by a gpu plugin of PostgreSQL.

nsys profile --stats=true --force-overwrite true --gpu-metrics-device=all --trace-fork-before-exec=true  -o outputstrom1  psql -d postgres --command 'explain analyze SELECT *  FROM ar4, ar2 where ar4.key=ar2.key;'

(When I use ‘psql -d postgres …’, PostgreSQL will fork a progres to receive my command.
And the progres use a gpu plugin which will create multiple worker progress by ‘pthread_create’. The worker progress will invoke cuda kernel function.)

But I only got the CPU information like this.

And I can get the CUDA execution information by nvprof. I start nvprof before launching ‘psql’.

nvprof --profile-all-processes -s -o strom.%p.nvvp

So I’m wondering if nsys has an option similar to ‘–profile-all-processes’?
Or how can I use nsys to get the cuda information?


Which version of nsys are you using (nsys --version)? Also, have you taken a look at the warnings on the report? They appear by clicking on the top right corner.

Sorry for the late reply.

$ nsys --version
NVIDIA Nsight Systems version 2021.2.1.58-642947b
NVIDIA-SMI 440.95.01    Driver Version: 440.95.01    CUDA Version: 11.4
Docker Image: nvidia/cuda:11.4.1-devel-centos8

The report shows that no CUDA event was collected. And child progress used CUDA actually.

Looking forward to your reply!

Did you find an answer? Same problem here.

Sorry, I don’t solve it. So I use nvprof…

If you have the answer, remember answer the question.
Thank you

I don’t have an answer either. We can’t use nvprof, as nvprof stopped working on new GPU generations. It is my understanding that Nvidia should provide an answer as to whether such problems do exist with current Nsight Systems, and the plan to fix the existing bugs. Only in this way can we as users, and more importantly, as consumers (we bought many A100, A40 and A10 types), know if the problem is from our use or is from the product itself, and find alternative solutions if necessary. I saw several complaints about nsys missing multi-process CUDA info, and unfortunately, there is no clear answer from Nvidia yet.

I got an answer. You need to add “–trace=cuda” manually although in the document, this flag is set by default. After adding this flag, cuda info is available in all processes. Let me know if it works for you.