Requested information:
DGPU - A30
Deepstream 6
TensorRT 8.0.1
GPU Driver 470.57.02 (but see below)
Issue:
We have been developing with Deepstream for over a year and have deployed containers on both Xavier NXs and a T4 VM in the Azure cloud. We recently acquired a server with 2x A30 dGPUs and I’ve just set it up, following the guide at Quickstart Guide — DeepStream 6.1.1 Release documentation
There is one inconsistency in that guide - it says "Download and install CUDA Toolkit 11.4.1 from: https://developer.nvidia.com/cuda-11-4-1-download-archive
In this page, it is mentioned NVIDIA Linux GPU driver 470.57.02 but still current version DeepStream uses is 470.63.0"
As a native English speaker I do not understand this “it is mentioned” sentence. I followed the instructions to install CUDA 11.4.1 and I believe one of the things it did was change the GPU driver to 470.57.02. I do not know if this is relevant to the problem at hand.
I guess another inconsistency is that the NGC deepstream pages (DeepStream | NVIDIA NGC) say “We recommend using Docker 19.03 along with the latest nvidia-container-toolkit as described in the installation steps”, except the linked page gives no details on how to use 19.03 along with the latest nvidia-container-toolkit.
Anyway I don’t believe Docker is the problem because running the code outside of Docker in the Deepstream installation folder causes the same issue.
The problem is my code in my container fails with errors in source bins, OSD bins and sink bins. This same container works perfectly within my T4 VM (Driver Version: 470.86).
Since this server does not have an Nvidia video output, to give you a reproducible test I ran one of the NGC containers with the deepstream-6 sample (details below). This gets broadly the same errors, namely:
ERROR from sink_sub_bin_encoder1: Could not get/set settings from/on resource.
Debug info: gstv4l2object.c(3501): gst_v4l2_object_set_format_full (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/nvv4l2h264enc:sink_sub_bin_encoder1:
Device is in streaming mode
This error repeats a bunch of times then the app quits.
Nvidia-SMI output:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A30 On | 00000000:21:00.0 Off | 0 |
| N/A 26C P0 28W / 165W | 0MiB / 24258MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A30 On | 00000000:81:00.0 Off | 0 |
| N/A 26C P0 27W / 165W | 0MiB / 24258MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
To reproduce:
Follow Deepstream 6.0 quick start guide for dGPU.
docker run --gpus ‘“‘device=0’”’ -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.0 nvcr.io/nvidia/deepstream:6.0-samples
cd /opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app
apt install nano
nano source30_1080p_dec_infer-resnet_tiled_display_int8.txt
(in nano, enable sink1, or sink2 and disable sink 0 to use file writing or use RTSP)
deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt
You will see the errors noted above.