Can deepstream5.0 support dynamic input in second gie or what should I do to support dynamic input in second gie?

I find TensorRT7 can already support dynamic input, but the code ‘get_converted_buffer’ in gstnvinfer.cpp:1012 of deepstream5.0 do not change , which still scale according to the size of the second network, I want to use dynamic input in text recognition ,what should i do to support it ?
Ubuntu18.04 + 1080ti
TensorRT 7

I am so sorry. I have canceled another topic, and I also have a doubt in get_converted_buffer
static GstFlowReturn
get_converted_buffer (GstNvInfer * nvinfer, NvBufSurface * src_surf,
NvBufSurfaceParams * src_frame, NvOSD_RectParams * crop_rect_params,
NvBufSurface * dest_surf, NvBufSurfaceParams * dest_frame,
gdouble & ratio_x, gdouble & ratio_y, void *destCudaPtr)
guint src_left = GST_ROUND_UP_2 ((unsigned int)crop_rect_params->left);
guint src_top = GST_ROUND_UP_2 ((unsigned int)crop_rect_params->top);
guint src_width = GST_ROUND_DOWN_2 ((unsigned int)crop_rect_params->width);
guint src_height = GST_ROUND_DOWN_2 ((unsigned int)crop_rect_params->height);
guint dest_width, dest_height;

if (nvinfer->maintain_aspect_ratio) {
/* Calculate the destination width and height required to maintain

  • the aspect ratio. */
    double hdest = dest_frame->width * src_height / (double) src_width;
    double wdest = dest_frame->height * src_width / (double) src_height;
    int pixel_size;
    cudaError_t cudaReturn;

if (hdest <= dest_frame->height) {
dest_width = dest_frame->width;
dest_height = hdest;
} else {
dest_width = wdest;
dest_height = dest_frame->height;

switch (dest_frame->colorFormat) {
pixel_size = 4;
pixel_size = 3;
pixel_size = 1;
g_assert_not_reached ();

/* Pad the scaled image with black color. */
cudaReturn =

  • cudaMemset2DAsync ((uint8_t *) destCudaPtr + pixel_size * dest_width,
  • dest_frame->planeParams.pitch[0], 0,
  • pixel_size * (dest_frame->width - dest_width), dest_frame->height,
  • nvinfer->convertStream);
    if (cudaReturn != cudaSuccess) {
    GST_ERROR_OBJECT (nvinfer,
    “cudaMemset2DAsync failed with error %s while converting buffer”,
    cudaGetErrorName (cudaReturn));
    return GST_FLOW_ERROR;
    cudaReturn =
    cudaMemset2DAsync ((uint8_t ) destCudaPtr +
    dest_frame->planeParams.pitch[0] * dest_height,
    dest_frame->planeParams.pitch[0], 0, pixel_size * dest_width,
    dest_frame->height - dest_height, nvinfer->convertStream);
    if (cudaReturn != cudaSuccess) {
    GST_ERROR_OBJECT (nvinfer,
    “cudaMemset2DAsync failed with error %s while converting buffer”,
    cudaGetErrorName (cudaReturn));
    return GST_FLOW_ERROR;
    } else {
    dest_width = nvinfer->network_width;
    dest_height = nvinfer->network_height;
    Calculate the scaling ratio of the frame / object crop. This will be
  • required later for rescaling the detector output boxes to input resolution.
    ratio_x = (double) dest_width / src_width;
    ratio_y = (double) dest_height / src_height;

/* Create temporary src and dest surfaces for NvBufSurfTransform API. */
nvinfer->tmp_surf.surfaceList[nvinfer->tmp_surf.numFilled] = *src_frame;

/* Set the source ROI. Could be entire frame or an object. /
nvinfer->transform_params.src_rect[nvinfer->tmp_surf.numFilled] =
{src_top, src_left, src_width, src_height};
Set the dest ROI. Could be the entire destination frame or part of it to

  • maintain aspect ratio. */
    nvinfer->transform_params.dst_rect[nvinfer->tmp_surf.numFilled] =
    {0, 0, dest_width, dest_height};


return GST_FLOW_OK;

As shown above, the result of first gie was scaled according to the network input size definition, I am very confused when scaling the image, only fill 0, so that does not cause the model to reduce the robustness of model? Because I I find that after transplanting my text recognition model to deepstream, the accuracy dropped seriously, Could you give me some advice?

Now, I find two reasons, the first one is my text recognition model needs dynamic input in second gie, the second one is filled with 0,these affect the accuracy of the model,

Filling zero is usually caused by the fact that height to width ratio of your input image does not match height to width ratio of the input shape defined by your model.

Multiple layered CNN supports dynamic inputs and outputs on most deep frameworks, and TensorRT supports dynamic width and height in a certain range defined by arguments when converting ONNX into TensorRT engine.

However, until now DeepStream supports dynamic batch size only, width and height are still limited to fixed sizes.
Maybe you have to choose an optimal H and W that suit your most cases in your scenarios.

1 Like