Can't run sample in DS container on A30 after following quickstart setup to the letter

Requested information:
DGPU - A30
Deepstream 6
TensorRT 8.0.1
GPU Driver 470.57.02 (but see below)

Issue:

We have been developing with Deepstream for over a year and have deployed containers on both Xavier NXs and a T4 VM in the Azure cloud. We recently acquired a server with 2x A30 dGPUs and I’ve just set it up, following the guide at Quickstart Guide — DeepStream 6.1.1 Release documentation

There is one inconsistency in that guide - it says "Download and install CUDA Toolkit 11.4.1 from: https://developer.nvidia.com/cuda-11-4-1-download-archive

In this page, it is mentioned NVIDIA Linux GPU driver 470.57.02 but still current version DeepStream uses is 470.63.0"

As a native English speaker I do not understand this “it is mentioned” sentence. I followed the instructions to install CUDA 11.4.1 and I believe one of the things it did was change the GPU driver to 470.57.02. I do not know if this is relevant to the problem at hand.

I guess another inconsistency is that the NGC deepstream pages (DeepStream | NVIDIA NGC) say “We recommend using Docker 19.03 along with the latest nvidia-container-toolkit as described in the installation steps”, except the linked page gives no details on how to use 19.03 along with the latest nvidia-container-toolkit.

Anyway I don’t believe Docker is the problem because running the code outside of Docker in the Deepstream installation folder causes the same issue.

The problem is my code in my container fails with errors in source bins, OSD bins and sink bins. This same container works perfectly within my T4 VM (Driver Version: 470.86).

Since this server does not have an Nvidia video output, to give you a reproducible test I ran one of the NGC containers with the deepstream-6 sample (details below). This gets broadly the same errors, namely:

ERROR from sink_sub_bin_encoder1: Could not get/set settings from/on resource.
Debug info: gstv4l2object.c(3501): gst_v4l2_object_set_format_full (): /GstPipeline:pipeline/GstBin:processing_bin_0/GstBin:sink_bin/GstBin:sink_sub_bin1/nvv4l2h264enc:sink_sub_bin_encoder1:
Device is in streaming mode

This error repeats a bunch of times then the app quits.

Nvidia-SMI output:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A30 On | 00000000:21:00.0 Off | 0 |
| N/A 26C P0 28W / 165W | 0MiB / 24258MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A30 On | 00000000:81:00.0 Off | 0 |
| N/A 26C P0 27W / 165W | 0MiB / 24258MiB | 0% Default |
| | | Disabled |
±------------------------------±---------------------±---------------------+

To reproduce:
Follow Deepstream 6.0 quick start guide for dGPU.
docker run --gpus ‘“‘device=0’”’ -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-6.0 nvcr.io/nvidia/deepstream:6.0-samples
cd /opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app
apt install nano
nano source30_1080p_dec_infer-resnet_tiled_display_int8.txt
(in nano, enable sink1, or sink2 and disable sink 0 to use file writing or use RTSP)
deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt

You will see the errors noted above.

Small update. I uninstalled CUDA, replaced the driver with 470.63.0 and re-installed CUDA from the .run file, making sure not to install the 470.57.02 driver (you don’t get this option if you use the .deb option)

Anyway, we still have same issue so we can rule that out as the cause.

OK I’ve realized the problem is the A30 doesn’t have any hardware H264/H265 encoders.

Adding ‘enc-type=1’ to the sink bins will get the sample to work at the price of huge CPU load. Astonishing that the A30/A100 don’t have this, but here we are.

I still think the documentation is bad, but I guess this can be closed.

Glad to know you find the cause.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.