DeepStream test1 app error: ERROR from element primary-nvinference-engine: Infer operation failed

Hello, I am experimenting with DeepStream SDK 3.0. When I run test1-app with the following command:

./deepstream-test1-app ~/DeepStream_Release/samples/streams/sample_720p.h264

the output reports “ERROR from element primary-nvinference-engine: Infer operation failed”.

The detail output is

Now playing: /root/DeepStream_Release/samples/streams/sample_720p.h264
>>> Generating new TRT model engine
Using FP32 data type.

 ***** Storing serialized engine file as /root/DeepStream_Release/sources/apps/sample_apps/deepstream-test1/../../../../samples/models/Primary_Detector/resnet10.caffemodel_b1_fp32.engine batchsize = 1 *****

Running...
Frame Number = 0 Number of objects = 0 Vehicle Count = 0 Person Count = 0
cuda/cudaFusedConvActLayer.cpp (287) - Cuda Error in executeFused: 48
cuda/cudaFusedConvActLayer.cpp (287) - Cuda Error in executeFused: 48
Enqueue failed during inference
ERROR from element primary-nvinference-engine: Infer operation failed
Error details: gstnvinfer.c(781): gst_nvinfer_inference_thread (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine
Returned, stopping playback
Frame Number = 1 Number of objects = 0 Vehicle Count = 0 Person Count = 0
cuda/cudaFusedConvActLayer.cpp (287) - Cuda Error in executeFused: 48
cuda/cudaFusedConvActLayer.cpp (287) - Cuda Error in executeFused: 48
Enqueue failed during inference
Deleting pipeline

By the way, I replaced nveglglessink with fakesink, and changed network-mode to 0.

The environment
OS: Ubuntu 16.04
GPU: K40c
NVIDIA Driver Version: 410.78
CUDA Version: 10.0
TensorRT: 5.0.2.6
cuDNN: 7.3.1
nvidia-docker: 1.0.1
Docker: 18.09.0

What might be the problem?

Hi,
Can you try to run like below, change the path to yours accordingly
~/work/cpxavier/TensorRT-5.0.2/usr/src/tensorrt/bin/trtexec --deploy=/home/tse/work/dssource/DeepStreamSDK/Model/IVAPrimary_resnet10_DeepstreamRel_V2_ivalarge_its_phase1/resnet10.prototxt --output=conv2d_bbox --output=conv2d_cov --batch=2 --device=1 --int8
my system have 2 nvidia GPU cards, hereby i use gpuid 1, it’s one p4 card, change to yours accordingly, and feedback when you got the result? thanks.

Hi, amycao,
the result is

  1. without int8
root@cb9d71d14ade:~/DeepStream_Release/samples/models/Primary_Detector# /usr/local/TensorRT-5.0.2.6/bin/trtexec --deploy=resnet10.prototxt --output=conv2d_bbox --output=conv2d_cov --batch=2 --device=0
deploy: resnet10.prototxt
output: conv2d_bbox
output: conv2d_cov
batch: 2
device: 0
Input "input_1": 3x368x640
Output "conv2d_bbox": 16x23x40
Output "conv2d_cov": 4x23x40
name=input_1, bindingIndex=0, buffers.size()=3
name=conv2d_bbox, bindingIndex=1, buffers.size()=3
name=conv2d_cov, bindingIndex=2, buffers.size()=3
Average over 10 runs is 9.6178 ms (host walltime is 9.81203 ms, 99% percentile time is 9.64509).
Average over 10 runs is 9.61194 ms (host walltime is 9.81955 ms, 99% percentile time is 9.63098).
Average over 10 runs is 9.60836 ms (host walltime is 9.80102 ms, 99% percentile time is 9.64755).
Average over 10 runs is 9.60653 ms (host walltime is 9.79853 ms, 99% percentile time is 9.6416).
Average over 10 runs is 9.60748 ms (host walltime is 9.80463 ms, 99% percentile time is 9.62582).
Average over 10 runs is 9.61442 ms (host walltime is 9.81987 ms, 99% percentile time is 9.63053).
Average over 10 runs is 9.61797 ms (host walltime is 9.82143 ms, 99% percentile time is 9.65859).
Average over 10 runs is 9.62259 ms (host walltime is 9.8242 ms, 99% percentile time is 9.65754).
Average over 10 runs is 9.62715 ms (host walltime is 9.83242 ms, 99% percentile time is 9.66416).
Average over 10 runs is 9.61373 ms (host walltime is 9.81387 ms, 99% percentile time is 9.63581).
  1. with int8
root@cb9d71d14ade:~/DeepStream_Release/samples/models/Primary_Detector# /usr/local/TensorRT-5.0.2.6/bin/trtexec --deploy=resnet10.prototxt --output=conv2d_bbox --output=conv2d_cov --batch=2 --device=0 --int8
deploy: resnet10.prototxt
output: conv2d_bbox
output: conv2d_cov
batch: 2
device: 0
int8
Input "input_1": 3x368x640
Output "conv2d_bbox": 16x23x40
Output "conv2d_cov": 4x23x40
Int8 support requested on hardware without native Int8 support, performance will be negatively affected.
name=input_1, bindingIndex=0, buffers.size()=3
name=conv2d_bbox, bindingIndex=1, buffers.size()=3
name=conv2d_cov, bindingIndex=2, buffers.size()=3
Average over 10 runs is 9.61389 ms (host walltime is 9.73778 ms, 99% percentile time is 9.63885).
Average over 10 runs is 9.62688 ms (host walltime is 9.75592 ms, 99% percentile time is 9.64278).
Average over 10 runs is 9.62987 ms (host walltime is 9.75898 ms, 99% percentile time is 9.65258).
Average over 10 runs is 9.63567 ms (host walltime is 9.76663 ms, 99% percentile time is 9.65658).
Average over 10 runs is 9.62528 ms (host walltime is 9.75364 ms, 99% percentile time is 9.6543).
Average over 10 runs is 9.6302 ms (host walltime is 9.75916 ms, 99% percentile time is 9.65443).
Average over 10 runs is 9.62786 ms (host walltime is 9.75631 ms, 99% percentile time is 9.6544).
Average over 10 runs is 9.62383 ms (host walltime is 9.75005 ms, 99% percentile time is 9.64026).
Average over 10 runs is 9.62651 ms (host walltime is 9.7801 ms, 99% percentile time is 9.63904).
Average over 10 runs is 9.61735 ms (host walltime is 9.78592 ms, 99% percentile time is 9.63341).

how many nvidia gpu cards in system, what card is for device id 0

There are 4 K40c GPU cards.

By the way, the host machine is CentOS 7.4.1708. CUDA 10.0 is mapped from host directory into container when run nvidia-docker. The full container start command is

nvidia-docker run -it -w /root -v /usr/local/cuda-10.0:/usr/local/cuda deepstream_docker_image bash

and all above tests are done in the container.

Can you run directly on system without running within docker to rule out environment issue?

As far as I know, DeepStream cannot run on CentOS 7.4 due to low version of gcc libraries that I have to run within docker container.

I have tried on another Ubuntu 16.04 host machine with a GTX 960 card, the test app within docker works fine.

So this might be an environment issue. But I’m not sure.

HI,
Deepstream do not support centos platform now.