Performing multiple inference in the nvinfer backend


I’m trying to extend the nvinfer backend by performing inference for an input surface more than once. For instance, in the nvdsinfer_backend.cpp, there’s a ImplicitTrtBackendContext::enqueueBuffermethod that queues an input buffer for inference:

    if (!m_Context->enqueue(batchDims.batchSize,, stream,
            (consumeEvent ? &consumeEvent->ptr() : nullptr)))
        dsInferError("Failed to enqueue inference batch");

Where the bindingBuffers are the input and output buffers of the model. Now, what my requirement is, for an input buffer (say an image I1) I would like to perform the inference twice, once for I1 and once for flip(I1) where the flip is defined as an image flip operation. Finally, I would like to average out the predictions (I think I can figure out this part)

How I imagine this would work:

  1. Extract the NvBufSurface which is the input image.
  2. Perform a buffer transform using NvBufferTransform_Flip by setting the flip transformation parameter to NvBufferTransform_FlipX to create a mirror image.
  3. Call .enqueue() on this batch
  4. Extract the GPU buffer, perform average operation.
  5. Return the GPU buffer

I have theoretically figured how this would work out. Now my question is:

  1. Is my approach stated above correct? If no, how should I go about it?
  2. How do I access the NvBufSurface in the ImplicitTrtBackendContext::enqueueBuffer method which uses the CUDA Stream to perform inference?

• Hardware Platform: T4 / Jetson NX / Jetson TX2NX
•DeepStream Version: 5.1
•JetPack Version (valid for Jetson only): 4.5.1
•TensorRT Version: 7.2
•NVIDIA GPU Driver Version (valid for GPU only): 455
•Issue Type( questions, new requirements, bugs): question
•Requirement details: Editing the nvdsinfer_backend.cpp to perform inference on a given image buffer twice

Hi @hatake_kakashi,

We recommend you to post your query on DeepStream forum. You may get better help here.

Thank you.

It’s already in the DeepStream forum right? Am I missing something?

1 Like

Hi any update on this?

Hey, we are checking it and update you ASAP.

Thanks, I figured it out. I wrote custom kernel to perform the flip operation because FLIP flag in buffer transform isn’t supported on dGPU. Furthermore, I edited the push to input thread function to create a “duplicate” batch with the new buffers and append it to the original batch queue. This way every batch was 2x’ed one with the original frames, and second with the flipped ones. Later, I just wrote a simple probe to parse the tensor meta data and add both of them together. Thanks!

Great work! thanks for your sharing, BTW, how do you handle the batch size since every batch size is 2x’ed than the original batch.

My engine was dynamic size, so the nvinfer backend worked just fine. In the gst-nvinfer plugin however, I patched the gst_nvinfer_output_loop to get “get rid” of the redundant frames of the input:

if (!batch->frames.empty() && nvinfer->enable_tta) {
        /* TTA is enabled, so half the batch needn't be processed */
        assert(batch->frames.size() % 2 == 0);
        batch->frames.resize(batch->frames.size() / 2);

Using this other features like tracking history and so forth will not break and just continue as before.