The models are RetinaNet trained with TAO for Car Detector and License Plate Detector according to article. I believe the problem isn’t with models.
I’ve dumped input tensors of License Plate Detector with raw-output-file-write=1 and converted them to images. Some dumps contains spurious data from other parts of input frame. Please look at two images I’ve attached in the first message: first image incorrectly composed of two crops. It looks like multi-thread access/locking issue:
1). the buffer cleared because of maintain-aspect-ratio=1
2). the crop of detected car made according one of NvDsObjectMeta, scaled and copied to buffer.
And sometimes operation 2) happens to repeat with some other NvDsObjectMeta for same input tensor.
This cause spurious detections from LPD.
Can it be some bug in the DeepStream 5.1 that already fixed?
Can it be some incompatibility issue with nvvideoconvert/nvstreammux/nvinfer/nvjpegdec?
Can it be some hardware issue with VIC? Maybe with NvBufSurfTransform? This is L4T 32.5.1.
Hi
Unfortunately we are restricted by product system requirements to use Jetson TX2, L4T 32.5.1, DeepStream 5.1.
I know the DeepStream 6.1 use NvBufSurfTransformAsync() to compose input image for secondary nvinfer instead of synchronous NvBufSurfTransform() as in DeepStream 5.1.
Could you ask colleagues maybe it was known issue in the DeepStream 5.1?
Hi
Could you please clarify.
The function get_converted_buffer() in the sources/gst-plugins/gst-nvinfer/gstnvinfer.cpp
performs calls of cudaMemset2DAsync() to clear buffer for maintain-aspect-ratio=1.
The calls seems not synchronized.
Then the NvBufSurfTransform() called to scale/copy image of detected object to the input tensor.
My question is: when happens synchronization of these operations?
Hi
Are you sure about no need to do sync?
Fom CUDA documentation: cudaMemset2DAsync() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.
The following operation NvBufSurfTransform() performed on VIC by default.
I ask because call of cudaStreamQuery() just before NvBufSurfTransform() sometimes returns cudaErrorNotReady, that is buffer actually not cleared yet.
yes, there is only one CUDA stream nvinfer->convertStream, which will be passed to NvBufSurfTransform, to user it is synchronous, using cudaMemset2DAsync GPU will start processing without waiting all data is received, compared cudaMemset2D it is an “async” mode. please compare deepstream 5.1 and 6.1, there is no sync operation.
2 . about "I’ve dumped input tensors of License Plate Detector with raw-output-file-write=1 and converted them to images. ", don’t know how you did that, we use DeepStream SDK FAQ - #9 by mchi section 3 to dump input tensor.
There is no update from you for a period, assuming this is not an issue any more. Hence we are closing this topic. If need further support, please open a new one. Thanks