(When I use ‘psql -d postgres …’, PostgreSQL will fork a progres to receive my command.
And the progres use a gpu plugin which will create multiple worker progress by ‘pthread_create’. The worker progress will invoke cuda kernel function.)
Which version of nsys are you using (nsys --version)? Also, have you taken a look at the warnings on the report? They appear by clicking on the top right corner.
I don’t have an answer either. We can’t use nvprof, as nvprof stopped working on new GPU generations. It is my understanding that Nvidia should provide an answer as to whether such problems do exist with current Nsight Systems, and the plan to fix the existing bugs. Only in this way can we as users, and more importantly, as consumers (we bought many A100, A40 and A10 types), know if the problem is from our use or is from the product itself, and find alternative solutions if necessary. I saw several complaints about nsys missing multi-process CUDA info, and unfortunately, there is no clear answer from Nvidia yet.
I got an answer. You need to add “–trace=cuda” manually although in the document, this flag is set by default. After adding this flag, cuda info is available in all processes. Let me know if it works for you.