I am trying to profile the GPU usage of my Deepstream application running in the latest DS6.1.1 image from nvidia NGC.
I have installed the latest
cuda-nsight-systems-11-7 package and running the container with
--privileged=true as described in User Guide :: Nsight Systems Documentation (nvidia.com)
I can collect CPU traces and NVTX ranges just fine, however when i try to collect GPU metrics with the
nsys profile --gpu-metrics-device=0 command i get the below error and profiling stops:
Am i doing something wrong?
Thanks in advance,
EDIT: The profiler crashes whenever my application tries to load any ML models onto the GPU even when
--gpu-metrics-device=0 is disabled. I can run the profiler with
-t osrt,nvtx OK but if i add
cuda the profiler crashes.
EDIT2: Heres a sample Deepstream pipeline causing the crash - the issue seems to be related to docker desktop?
@rdietrich can you take a look at this?
I assume that you have nvidia-docker installed. Can you please post the full docker command you are using?
--gpus all should be passed to the docker run command.
The error message basically tells you that another instance/process is already using the resources to access GPU metrics. Can you check if other processes, e.g. other Nsight Systems processes are running.
Do you have an example execution for your first “EDIT”?
I couldn’t reproduce the crash from “EDIT2”. Can you check if removing
osrt from the tracing options solves the issue? Can you post the container image, the docker run command and what you additionally installed inside the docker container, so that I can try to reproduce the segmentation fault?
Hi @rdietrich ,
Here are some simple commands to reproduce the issue:
docker run -it --gpus 0 --privileged nvcr.io/nvidia/deepstream:6.1.1-base
apt-get install -y cuda-nsight-systems-11-7 gdb
nsys profile -t osrt,nvtx gst-launch-1.0 videotestsrc ! nvvideoconvert ! "video/x-raw(memory:NVMM)" ! fakesink
nsys profile -t osrt,nvtx,cuda gst-launch-1.0 videotestsrc ! nvvideoconvert ! "video/x-raw(memory:NVMM)" ! fakesink
I am starting to believe that is definitely related to Docker Desktop on Windows. Nsight (and the above commands) are working as intended when we execute them on our k8s cluster running on T4 nodes.
Heres some platform information;
Nvidia driver: 516.94
WSL2 Kernel: 220.127.116.11
nsys status -e
If you need anything else please let me know.
Thanks for providing the details. I have missed the fact that your are running Docker Desktop on Windows. I tried docker from Ubuntu which works just fine with your commands. So it’s likely related to Docker Desktop. We try to get it reproduced and look for a solution.
Update: I could reproduce the issue. Since the backtrace contains NSYS_OSRT, I removed this from the trace options, but the application seems to silently fail then. Removing
(memory:NVMM) avoids the segmentation fault, but that’s not a solution.
@rdietrich Thanks for the update and effort,
Indeed, the (memory:NVMM) is what’s actually making the Deepstream pipeline utilize the GPU (just simple cudaMemcpy2D calls).
I’m able to collect some NVTX ranges from TensorRT which is somewhat useful - however having access to the CUDA traces and GPU utilization in our development environment (Docker on Windows) would be incredible helpful.
Please let me know if I can provide any assistance in troubleshooting,
Looks like I have to pass the baton to another team. Nsight Systems is using CUPTI to gather CUDA activities and enabling memcopy and memset activity collection with CUPTI causes the segmentation fault.
For now, you can disable recording of these two activity types by setting the nsys configuration as follows:
NSYS_CONFIG_DIRECTIVES='CUPTIDisableMemcpyCollection=true;CUPTIDisableMemsetCollection=true' nsys profile ...
Let us know if that doesn’t work for you.
We will follow up on the cause of the bug and let you know when it is fixed or if a more convenient workaround exists.
Thank you for providing a temporary workaround.
We will be looking forward to a proper fix in the future,