Cuda failure in Deepstream docker on Centos 7

akos.peter.szabo · May 29, 2020, 2:53pm

Hi,

Lately we are testing different hardware-accelerate video-analytic solutions, and Deepstream seems promising.
We would like to try it out on our MEC device, that has 2 Quadro RTX 8000 and NVIDIA GRID installed in it.
The host OS is Centos7.

±---------------------------- | Processes: | GPU PID Type Process name |============================= | 0 42351 C+G vgpu | 0 42997 C+G vgpu | 0 43192 C+G vgpu | 0 43255 C+G vgpu | 0 43612 C+G vgpu | 0 43982 C+G vgpu | 0 44081 C+G vgpu | 0 44138 C+G vgpu | 0 44535 C+G vgpu | 0 44790 C+G vgpu | 0 44905 C+G vgpu | 0 45553 C+G vgpu | 1 45039 C+G vgpu | 1 45698 C+G vgpu | 1 45786 C+G vgpu | 1 45934 C+G vgpu | 1 46442 C+G vgpu | 1 46581 C+G vgpu | 1 46712 C+G vgpu | 1 46792 C+G vgpu | 1 47388 C+G vgpu | 1 47505 C+G vgpu | 1 47693 C+G vgpu | 1 47743 C+G vgpu | 1 48318 C+G vgpu | 1 48461 C+G vgpu | 1 48652 C+G vgpu | 1 48727 C+G vgpu ±---------------------------- ------------------------------------------------+
GPU Memory |
Usage |
================================================|
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
4056MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
3042MiB |
------------------------------------------------+

[root@198 ~]# modinfo nvidia
filename: /lib/modules/3.10.0-1062.9.1.el7.x86_64/weak-updates/nvidia/nvidia.ko
alias: char-major-195-*
version: 430.46
supported: external
license: NVIDIA
retpoline: Y
rhelversion: 7.7
srcversion: 922226EAFE970320108DB9A
alias: pci:v000010DEd00000E00svsdbc04sc80i00*
alias: pci:v000010DEdsvsdbc03sc02i00
alias: pci:v000010DEdsvsdbc03sc00i00
depends: ipmi_msghandler
vermagic: 3.10.0-1057.el7.x86_64 SMP mod_unload modversions
parm: NvSwitchRegDwords:NvSwitch regkey (charp)
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_MapRegistersEarly:int
parm: NVreg_RegisterForACPIEvents:int
parm: NVreg_EnablePCIeGen3:int
parm: NVreg_EnableMSI:int
parm: NVreg_TCEBypassMode:int
parm: NVreg_EnableStreamMemOPs:int
parm: NVreg_EnableBacklightHandler:int
parm: NVreg_RestrictProfilingToAdminUsers:int
parm: NVreg_PreserveVideoMemoryAllocations:int
parm: NVreg_DynamicPowerManagement:int
parm: NVreg_EnableUserNUMAManagement:int
parm: NVreg_MemoryPoolSize:int
parm: NVreg_KMallocHeapMaxSize:int
parm: NVreg_VMallocHeapMaxSize:int
parm: NVreg_IgnoreMMIOCheck:int
parm: NVreg_NvLinkDisable:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RegistryDwordsPerDevice:charp
parm: NVreg_RmMsg:charp
parm: NVreg_GpuBlacklist:charp
parm: NVreg_TemporaryFilePath:charp
parm: NVreg_AssignGpus:charp

As we are not planning to update our nvidia driver for now, and Deepstream is not supported on Centos yet, we tried out dockerized version: nvcr.io/nvidia/deepstream:4.0.2-19.12-devel.

We start it with the following command:
[centos@hp-gpu-node1 ~]$ docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix --env=“DISPLAY” --net=host -e DISPLAY=$DISPLAY -w /opt/nvidia/deepstream/deepstream-4.0 --volume=“$HOME/.Xauthority:/root/.Xauthority:rw” nvcr.io/nvidia/deepstream:4.0.2-19.12-devel

And try to run the deepstream-test1-app, but get following error:

root@hp-gpu-node1:/opt/nvidia/deepstream/deepstream-4.0# cd ~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1/
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1# deepstream-test1-app …/…/…/…/samples/streams/sample_720p.h264
Now playing: …/…/…/…/samples/streams/sample_720p.h264
libEGL warning: DRI3: failed to query the version
libEGL warning: DRI2: failed to authenticate
Creating LL OSD context new
0:00:08.188922891 10 0x557358461430 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:18.115738173 10 0x557358461430 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:generateTRTModel(): Storing the serialized cuda engine to file at /root/deepstream_sdk_v4.0.2_x86_64/samples/models/Primary_Detector/resnet10.caffemodel_b1_int8.engine
Running…
Cuda failure: status=801

Could you please help what could be the problem?

mdegans · May 29, 2020, 3:59pm

It could be that your driver version is too low 430.46 It’s probably what’s in the package repositories, but it’s not new enough so you have to install it using Nvidia’s instructions for your distro.

I haven’t used used CentOS in a while but this is the case on Ubuntu, where an extra apt repo must be added (or the driver built using the .run file, but that will break on a kernel update). If you have secure boot enabled, i’d recommended DKMS package (if it exists for CentOS). At least on Ubuntu this is the easiest way to configure automatic module signing. Without this you’ll have to manually sign the kernel module on every update (or disable secure boot, which is what most people do, which is bad).

akos.peter.szabo · June 2, 2020, 3:44pm

Thanks for the tip.
According to this table Deepstream SDK 4.0.2 was supported from R418+ :

So I think the driver version should be OK for this deepstream version.

Can you please point out that missing apt repo/ a proper guidance to install the docker version on CentOS?

I suspect, that some host vs. docker CUDA linkage is missing, but I can’t find out the exact solution.

mdegans · June 2, 2020, 4:17pm

Sorry. I didn’t realize you were at version 4. That’s my fault for not reading closely enough. I think somebody from Nvidia will have to answer your question in this case since I’m not sure what the issue is.

mchi · June 2, 2020, 4:18pm

Thanks, @mdegans!

Hi @akos.peter.szabo,
could you try below change , build and run again with “**./**deepstream-test1-app …/…/…/…/samples/streams/sample_720p.h264” ?

--- a/deepstream_test1_app.c
+++ b/deepstream_test1_app.c
@@ -203,7 +203,8 @@ main (int argc, char *argv[])
 #ifdef PLATFORM_TEGRA
   transform = gst_element_factory_make ("nvegltransform", "nvegl-transform");
 #endif
-  sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");
+  //sink = gst_element_factory_make ("nveglglessink", "nvvideo-renderer");
+  sink = gst_element_factory_make ("fakesink", "nvvideo-renderer");

   if (!source || !h264parser || !decoder || !pgie
       || !nvvidconv || !nvosd || !sink) {

And, could you try to cuda-gdb to capture the backtrace

cuda-gdb ./deepstream-test1-app …/…/…/…/samples/streams/sample_720p.h264

after crash, input “bt” to get backtrace.

Thanks!

akos.peter.szabo · June 3, 2020, 1:59pm

Hi @mchi,

I tried the modified application, but CUDA still shows failure.

I also tried to run it with cuda-gdb, but the problem is the application is not crashing, it only raises “Cuda failure: status=801”.
So I couldn’t get the backtrace.

Thanks,
Akos

mchi · June 3, 2020, 2:39pm

could you try all below three command ?

$ gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.h264 ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=1920 height=1080 batch-size=1 batched-push-timeout=4000000 ! nvinfer config-file-path=dstest1_pgie_config.txt ! nvvideoconvert ! nvdsosd ! fakesink

$ gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.h264 ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=1920 height=1080 batch-size=1 batched-push-timeout=4000000 ! nvinfer config-file-path=dstest1_pgie_config.txt ! nvvideoconvert ! fakesink

$ gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.h264 ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=1920 height=1080 batch-size=1 batched-push-timeout=4000000 ! nvinfer config-file-path=dstest1_pgie_config.txt ! fakesink

akos.peter.szabo · June 8, 2020, 2:06pm

Hi mchi,

I tried these three commands, all provide the same error. (Cuda failure: status=801)

root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1# gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.h264 ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=1920 height=1080 batch-size=1 batched-push-timeout=4000000 ! nvinfer config-file-path=dstest1_pgie_config.txt ! nvvideoconvert ! nvdsosd ! fakesink
Setting pipeline to PAUSED …
Creating LL OSD context new
0:00:00.804678715 617 0x556678d4b550 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:10.661127189 617 0x556678d4b550 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:generateTRTModel(): Storing the serialized cuda engine to file at /root/deepstream_sdk_v4.0.2_x86_64/samples/models/Primary_Detector/resnet10.caffemodel_b1_int8.engine
Pipeline is PREROLLING …
Cuda failure: status=801
^Chandling interrupt.
Interrupt: Stopping pipeline …
ERROR: pipeline doesn’t want to preroll.
Setting pipeline to NULL …

^C
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1#
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1# gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.h264 ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=1920 height=1080 batch-size=1 batched-push-timeout=4000000 ! nvinfer config-file-path=dstest1_pgie_config.txt ! nvvideoconvert ! fakesink
Setting pipeline to PAUSED …
0:00:00.815580425 627 0x5653391b7920 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:10.758174525 627 0x5653391b7920 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:generateTRTModel(): Storing the serialized cuda engine to file at /root/deepstream_sdk_v4.0.2_x86_64/samples/models/Primary_Detector/resnet10.caffemodel_b1_int8.engine
Pipeline is PREROLLING …
Cuda failure: status=801
^Chandling interrupt.
Interrupt: Stopping pipeline …
ERROR: pipeline doesn’t want to preroll.
Setting pipeline to NULL …
^C
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1#
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1#
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1#
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1# gst-launch-1.0 filesrc location=…/…/…/…/samples/streams/sample_720p.h264 ! h264parse ! nvv4l2decoder ! m.sink_0 nvstreammux name=m width=1920 height=1080 batch-size=1 batched-push-timeout=4000000 ! nvinfer config-file-path=dstest1_pgie_config.txt ! fakesink
Setting pipeline to PAUSED …
0:00:00.792476123 637 0x55b01adc3a60 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:10.393224037 637 0x55b01adc3a60 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger: NvDsInferContext[UID 1]:generateTRTModel(): Storing the serialized cuda engine to file at /root/deepstream_sdk_v4.0.2_x86_64/samples/models/Primary_Detector/resnet10.caffemodel_b1_int8.engine
Pipeline is PREROLLING …
Cuda failure: status=801
^Chandling interrupt.
Interrupt: Stopping pipeline …
ERROR: pipeline doesn’t want to preroll.
Setting pipeline to NULL …
^C
root@hp-gpu-node1:~/deepstream_sdk_v4.0.2_x86_64/sources/apps/sample_apps/deepstream-test1#

mchi · June 9, 2020, 4:08pm

Can you run any CUDA sample and TensorRT sample on your system?

Thanks!

kayccc · June 23, 2020, 1:14am

Hi

Is this still an issue to support? Any result/status can be shared?

Thanks

akos.peter.szabo · June 23, 2020, 7:52am

Hi all,

Currently we put this topic on hold.
You can close this forum for now.
We will reopen it when needed.

Thanks,
Akos

Topic		Replies	Views
Cuda failure: status=801 Error(-1) in buffer allocation DeepStream SDK	36	1353	September 18, 2023
Deepstream pipeline waits for input indefinitely DeepStream SDK deepstream61	19	669	June 16, 2022
New installation Multiple Failues DeepStream SDK	18	1118	June 28, 2022
Requirements of a host to run a deepstream in a container DeepStream SDK	19	1961	October 12, 2021
Deepstream setup issues DeepStream SDK	6	2743	February 23, 2022
Deepstream execution fails DeepStream SDK	13	1791	July 19, 2022
Python bindings sample apps cant run DeepStream SDK nvbugs	7	603	January 11, 2024
DeepStream samples fail in fresh docker-container on centos 7.9 host system: Device is in streaming mode DeepStream SDK	15	541	October 27, 2022
Cuda failure: status=801 DeepStream 6.1 o Azure DeepStream SDK	3	562	February 13, 2023
Deepstream 6.1: deepstream-app not working after install DeepStream SDK deepstream61	9	2034	September 5, 2022

Cuda failure in Deepstream docker on Centos 7

cuda-gdb ./deepstream-test1-app …/…/…/…/samples/streams/sample_720p.h264

Related topics