DeepStream nvinfer input tensor contains incorrect image

We’ve found the DeepStream nvinfer input tensor sometimes contains incorrect image when using GIE config option maintain-aspect-ratio=1.

Looks like forming input tensor for TRT inference non-atomically consists of operations:

  1. set buffer to 0 (make black canvas)
  2. copy input image crop over

And occasionally spurious crop from other object on same or previous frame being copied over before copy crop of correct object

In effect the input tensor looks like:

Correct input tensor should looks like:

The inference pipeline created by gst_parse_launch():

appsrc name=ds_appsrc caps=video/x-raw,format=(string)BGR,width=(int)1920,height=(int)1080,framerate=(fraction)5/1 !
queue !
videoconvert ! video/x-raw,format=GRAY8 ! 
nvvideoconvert ! video/x-raw(memory:NVMM),format=NV12,colorimetry=bt601 !
m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 !
nvinfer name=nvinfer_cd config-file-path=/opt/models/cd/pgie-1.txt ! 
nvinfer name=nvinfer_lpd config-file-path=/opt/models/lpd/sgie-2_lpd.txt raw-output-file-write=1 ! 
fakesink sync=false

The input GstBuffer being pushed to appsrc are created by gst_buffer_new_allocate().

I’ve attached nvinfer configs:
pgie-1.txt (4.0 KB)
sgie-2_lpd.txt (3.6 KB)

$ jetson_release

  • NVIDIA Jetson TX2
    • Jetpack UNKNOWN [L4T 32.5.1]
    • NV Power Mode: MAXP_CORE_ARM - Type: 3
    • jetson_stats.service: active
  • Libraries:
    • CUDA: 10.2.89
    • cuDNN:
    • TensorRT:
    • Visionworks:
    • OpenCV: 4.1.1 compiled CUDA: NO
    • VPI: ii libnvvpi1 1.0.15 arm64 NVIDIA Vision Programming Interface library
    • Vulkan: 1.2.70
      DeepStream 5.1

which deepstream sample are you testing ? could you provide simple code reproduce this issue?

The python script based on deepstream_python_apps/ at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub (10.3 KB)

The prob can be also reproduced by command line:

gst-launch-1.0 -e -v multifilesrc location=frame_%05d.jpg \
start-index=0 stop-index=-1 caps=image/jpeg,framerate=\(fraction\)25/2 \
! nvjpegdec \
! nvvideoconvert \
! capsfilter video/x-raw\(memory:NVMM\),format=RGBA \
! m.sink_0 nvstreammux name=m batch-size=1 width=1920 height=1080 \
! nvinfer config-file-path=pgie-1.txt \
! nvinfer config-file-path=sgie-2_lpd.txt raw-output-file-write=1 \
! nvdsosd ! nvvidconv ! nvjpegenc ! multifilesink location=out_dir/frame_%05d.jpg

Sample image:

Models are based on tlt_pretrained_object_detection_vresnet18/resnet_18.hdf5

Should I provide additional info that could speed up diagnostics?

sorry for late response, some questions:

  1. what does your models do? do you mean the tensor the sgie got is wrong? nvinfer is opensource, you can add logs to narrow down this issue.
  2. about “The prob can be also reproduced by command line:”, I have no the two models, could you provide the whole simple code to reproduce this issue?

The models are RetinaNet trained with TAO for Car Detector and License Plate Detector according to article. I believe the problem isn’t with models.
I’ve dumped input tensors of License Plate Detector with raw-output-file-write=1 and converted them to images. Some dumps contains spurious data from other parts of input frame. Please look at two images I’ve attached in the first message: first image incorrectly composed of two crops. It looks like multi-thread access/locking issue:
1). the buffer cleared because of maintain-aspect-ratio=1
2). the crop of detected car made according one of NvDsObjectMeta, scaled and copied to buffer.
And sometimes operation 2) happens to repeat with some other NvDsObjectMeta for same input tensor.
This cause spurious detections from LPD.

Can it be some bug in the DeepStream 5.1 that already fixed?
Can it be some incompatibility issue with nvvideoconvert/nvstreammux/nvinfer/nvjpegdec?
Can it be some hardware issue with VIC? Maybe with NvBufSurfTransform? This is L4T 32.5.1.

Thank you.

after testing GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream on jetson xavier deepstream6.1, I can’t reproduce that issue, tensor is correctly filled with black border when maintain-aspect-ratio=1, please try the new version.
you can use DeepStream SDK FAQ - #9 by mchi to dump input tensor.

Unfortunately we are restricted by product system requirements to use Jetson TX2, L4T 32.5.1, DeepStream 5.1.
I know the DeepStream 6.1 use NvBufSurfTransformAsync() to compose input image for secondary nvinfer instead of synchronous NvBufSurfTransform() as in DeepStream 5.1.
Could you ask colleagues maybe it was known issue in the DeepStream 5.1?

Thank you.

did not find the same issue.

Could you please clarify.
The function get_converted_buffer() in the sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp
performs calls of cudaMemset2DAsync() to clear buffer for maintain-aspect-ratio=1.
The calls seems not synchronized.
Then the NvBufSurfTransform() called to scale/copy image of detected object to the input tensor.
My question is: when happens synchronization of these operations?

Thank you.

cudaMemset2DAsync is used to accelerate processing, here no need to do sync.

Are you sure about no need to do sync?
Fom CUDA documentation:
cudaMemset2DAsync() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.
The following operation NvBufSurfTransform() performed on VIC by default.
I ask because call of cudaStreamQuery() just before NvBufSurfTransform() sometimes returns cudaErrorNotReady, that is buffer actually not cleared yet.

I tested with GitHub - NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream too. With Jetson TX2/DeepStream 5.1. Got same issue - sometimes tensor of secondary detector contain stale data.

  1. yes, there is only one CUDA stream nvinfer->convertStream, which will be passed to NvBufSurfTransform, to user it is synchronous, using cudaMemset2DAsync GPU will start processing without waiting all data is received, compared cudaMemset2D it is an “async” mode. please compare deepstream 5.1 and 6.1, there is no sync operation.
    2 . about "I’ve dumped input tensors of License Plate Detector with raw-output-file-write=1 and converted them to images. ", don’t know how you did that, we use DeepStream SDK FAQ - #9 by mchi section 3 to dump input tensor.

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.