I can collect CPU traces and NVTX ranges just fine, however when i try to collect GPU metrics with the nsys profile --gpu-metrics-device=0 command i get the below error and profiling stops:
EDIT: The profiler crashes whenever my application tries to load any ML models onto the GPU even when --gpu-metrics-device=0 is disabled. I can run the profiler with -t osrt,nvtx OK but if i add cuda the profiler crashes.
EDIT2: Heres a sample Deepstream pipeline causing the crash - the issue seems to be related to docker desktop?
I assume that you have nvidia-docker installed. Can you please post the full docker command you are using? --privileged=true and --gpus all should be passed to the docker run command.
The error message basically tells you that another instance/process is already using the resources to access GPU metrics. Can you check if other processes, e.g. other Nsight Systems processes are running.
Do you have an example execution for your first “EDIT”?
I couldn’t reproduce the crash from “EDIT2”. Can you check if removing osrt from the tracing options solves the issue? Can you post the container image, the docker run command and what you additionally installed inside the docker container, so that I can try to reproduce the segmentation fault?
I am starting to believe that is definitely related to Docker Desktop on Windows. Nsight (and the above commands) are working as intended when we execute them on our k8s cluster running on T4 nodes.
Thanks for providing the details. I have missed the fact that your are running Docker Desktop on Windows. I tried docker from Ubuntu which works just fine with your commands. So it’s likely related to Docker Desktop. We try to get it reproduced and look for a solution.
Update: I could reproduce the issue. Since the backtrace contains NSYS_OSRT, I removed this from the trace options, but the application seems to silently fail then. Removing (memory:NVMM) avoids the segmentation fault, but that’s not a solution.
Indeed, the (memory:NVMM) is what’s actually making the Deepstream pipeline utilize the GPU (just simple cudaMemcpy2D calls).
I’m able to collect some NVTX ranges from TensorRT which is somewhat useful - however having access to the CUDA traces and GPU utilization in our development environment (Docker on Windows) would be incredible helpful.
Please let me know if I can provide any assistance in troubleshooting,
Looks like I have to pass the baton to another team. Nsight Systems is using CUPTI to gather CUDA activities and enabling memcopy and memset activity collection with CUPTI causes the segmentation fault.
For now, you can disable recording of these two activity types by setting the nsys configuration as follows: NSYS_CONFIG_DIRECTIVES='CUPTIDisableMemcpyCollection=true;CUPTIDisableMemsetCollection=true' nsys profile ...
Let us know if that doesn’t work for you.
We will follow up on the cause of the bug and let you know when it is fixed or if a more convenient workaround exists.