YOLO SGIE problem with resize

I’m using back to back detectors with YoloV4, where the first detector detects a board and second detector detects the characters within the board. I’m using enable-padding=0 and maintain-aspect-ratio=0 (I tried maintaining aspect ratio, but the results were worse) What I’m observing is pgie->sgie detects the boards, but doesn’t detect the characters, but when I manually create a video with only boards, and use my previous sgie as pgie, it works fine. Following are some more observations:

Input video dimension: 1280x720

Muxer dimension: 1280x720
pgie: works
sgie: doesn't work

Muxer dimension: 1920x1080
pgie: works
sgie: works half the time

Muxer dimension: 3000x2000
pgie: works
sgie: works 

Muxer dimension: 1280x720 
using sgie as pgie: works

In the end, (both of) my YoloV4 input dimension is 608x608 so how is it that if I resize to 608x608 from 1280x720 doesn’t work but from 3000x2000 works? Sometimes when my board is smaller in a video, I have to further increase the muxer dimension to something ridiculous like 5000x8000 otherwise it doesn’t work. I have also read gst-nvinfer code but everything looks normal. Looking for some suggestions here!

• Hardware Platform: T4
• DeepStream Version: 5.0
• TensorRT Version: 7.0
• NVIDIA GPU Driver Version: 440.33.01
• Issue Type: question/bug

1 Like

Any update on this post?

Hi @geralt_of_rivia,
When training your YoloV4, did you do the same preprocessing to the training images, i.e. maintain-aspect-ratio=0?

Thanks!

No, while training I didn’t set maintain-aspect-ratio=0, it was set to 1

so, you need to use maintain-aspect-ratio=1 as well in inference.
And, in DeepStream, since classfication gie inference the detected object, so there is crop, resize to the size of the network input, with maintain-aspect-ratio=1, there is padding, currently, the padding is in the bottom or in the right of the image like below, so you need to have the same preprocessing for the training to generate the model.

image

Thanks for your response. It turns out that in yolo training you need to set letter-box=1 to maintain aspect ratio. I didn’t set it, so my in my training, I’m not maintaining aspect ratio (I thought it did by default, but my understanding was not correct)

I have written some custom code in gstnvinfer.cpp to save the image right before the sgie does inference, and have following observation:

I have two key observations here:

  1. Firstly, it seems there’s something wrong with the transformation and resize function, there is some random noise being added onto the initial and latter part of the image.
  2. Secondly, the image that was resized from 3000x2000 to 608x608 is much more clear than the one resized from 1280x720 to 608x608 this explains why the performance of the model was very good on the 3000x2000 image but not on the one with 1280x720, you can see there are some notable differences in the image like the lines are much more smooth in the second image, while in the first they are really jagged.

So I have two questions:

  1. Is there a way to control what algorithm is used to resize?
  2. What might be the cause of the noise being added to the image, how to resolve it?

Thanks

scaling-filter” can be used to defined the scaling algorightm provided in enum NvBufSurfTransform_Inter in nvbufsurftransform.h.
By default, it’s NvBufSurfTransformInter_Default, you could try others, e.g. NvBufSurfTransformInter_Algo2

/**
 * Specifies video interpolation methods.
 */
typedef enum
{
  /** Specifies Nearest Interpolation Method interpolation. */
  NvBufSurfTransformInter_Nearest = 0,
  /** Specifies Bilinear Interpolation Method interpolation. */
  NvBufSurfTransformInter_Bilinear,
  /** Specifies GPU-Cubic, VIC-5 Tap interpolation. */
  NvBufSurfTransformInter_Algo1,
  /** Specifies GPU-Super, VIC-10 Tap interpolation. */
  NvBufSurfTransformInter_Algo2,
  /** Specifies GPU-Lanzos, VIC-Smart interpolation. */
  NvBufSurfTransformInter_Algo3,
  /** Specifies GPU-Ignored, VIC-Nicest interpolation. */
  NvBufSurfTransformInter_Algo4,
  /** Specifies GPU-Nearest, VIC-Nearest interpolation. */
  NvBufSurfTransformInter_Default
} NvBufSurfTransform_Inter;

Alright thanks, I will try that out. What about the issue (2) where there is noise around the image? What explains that?

Not sure for now!
As you can see, the image I shared above was dumpped just before TRT infer, there is not noise.

But in my case, it is coming out from pgie. So the output I have shared it before sgie inference. Any idea on what might be the root cause? Something that might go wrong between pgie output and sgie input? What can I do to diagnose the issue?

This is the code I wrote to save the images, can you review it? Maybe it might have some errors. nvinfer->resizedFrames_surf is a custom surface that I created to store the transformed surface. I created it in the following way:

 NvBufSurfaceCreateParams create_params;
 create_params.gpuId = nvinfer->gpu_id;
 create_params.width = 608;
 create_params.height = 608;
 create_params.size = 0; 
 create_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
 create_params.layout = NVBUF_LAYOUT_PITCH;
 create_params.memType = NVBUF_MEM_CUDA_UNIFIED;
 
 //create surface for holding 
 if (NvBufSurfaceCreate(&nvinfer->resizedFrames_surf, 1, &create_params) != 0) {
 g_printf("\nError: Could not allocate internal buffer for dsexample");
 return false;
 }

And then I wrote these two functions to extract the image and save it to disk

static void save_transformed_plate_images(NvBufSurface * surface) {
  /* Map the buffer so that it can be accessed by CPU */
  if (NvBufSurfaceMap(surface, 0, 0, NVBUF_MAP_READ) != 0) {
    g_printf("\nunable to map intermediate surface");
    return;
  }
  for (uint frameIndex = 0; frameIndex < surface->numFilled;
       frameIndex++) {
    NvBufSurfaceSyncForCpu(surface, frameIndex, 0);
    cv::Mat rgbFrame = cv::Mat(
        cv::Size(surface->surfaceList[frameIndex].width,
                 surface->surfaceList[frameIndex].height),
        CV_8UC3);
    cv::Mat *rgbaFrame = new cv::Mat(
        surface->surfaceList[frameIndex].height,
        surface->surfaceList[frameIndex].width, CV_8UC4,
        surface->surfaceList[frameIndex].mappedAddr.addr[0],
        surface->surfaceList[frameIndex].pitch);
#if (CV_MAJOR_VERSION >= 4)
    cv::cvtColor(*rgbaFrame, rgbFrame, cv::COLOR_RGBA2BGR);
#else
    cv::cvtColor(*rgbaFrame, rgbFrame, CV_RGBA2BGR);
#endif
    ++i;
    std::string saveLocation =
        "../plates/img_" + std::to_string(i) + std::string(".jpg");
    cv::imwrite(saveLocation, rgbFrame);
  }
  if (NvBufSurfaceUnMap(surface, 0, 0) != 0) {
    g_printf("\nunable to map intermediate surface");
    return;
  }
}
static gboolean
convert_batch_and_push_to_input_thread (GstNvInfer *nvinfer,
    GstNvInferBatch *batch, GstNvInferMemory *mem)
{
  NvBufSurfTransform_Error err = NvBufSurfTransformError_Success;
  std::string nvtx_str;
  /* Set the transform session parameters for the conversions executed in this
   * thread. */
  err = NvBufSurfTransformSetSessionParams (&nvinfer->transform_config_params);
  if (err != NvBufSurfTransformError_Success) {
    GST_ELEMENT_ERROR (nvinfer, STREAM, FAILED,
        ("NvBufSurfTransformSetSessionParams failed with error %d", err), (NULL));
    return FALSE;
  }
  nvtxEventAttributes_t eventAttrib = {0};
  eventAttrib.version = NVTX_VERSION;
  eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
  eventAttrib.colorType = NVTX_COLOR_ARGB;
  eventAttrib.color = 0xFFFF0000;
  eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;
  nvtx_str = "convert_buf batch_num=" + std::to_string(nvinfer->current_batch_num);
  eventAttrib.message.ascii = nvtx_str.c_str();
  nvtxDomainRangePushEx(nvinfer->nvtx_domain, &eventAttrib);  
  if (batch->frames.size() > 0) {    
    /* Batched tranformation. */
    err = NvBufSurfTransform(&nvinfer->tmp_surf, mem->surf,
                             &nvinfer->transform_params);
  }
  
  if (err != NvBufSurfTransformError_Success) {
    GST_ELEMENT_ERROR (nvinfer, STREAM, FAILED,
        ("NvBufSurfTransform failed with error %d while converting buffer", err),
        (NULL));
    return FALSE;
  }
  
  // save transformed plate images to disk  
  // save plates if operating in secondary mode
   if (err == NvBufSurfTransformError_Success && !nvinfer->process_full_frame
   &&
      batch->frames.size() > 0) {
    nvinfer->resizedFrames_surf->surfaceList->dataSize =
    mem->surf->surfaceList->dataSize;
    nvinfer->resizedFrames_surf->surfaceList->layout =
    mem->surf->surfaceList->layout;
    nvinfer->resizedFrames_surf->surfaceList->pitch =
    mem->surf->surfaceList->pitch;
    nvinfer->resizedFrames_surf->surfaceList->planeParams =
    mem->surf->surfaceList->planeParams;
    nvinfer->resizedFrames_surf->surfaceList->bufferDesc =
    mem->surf->surfaceList->bufferDesc;
    nvinfer->resizedFrames_surf->surfaceList->height =
    mem->surf->surfaceList->height;
    nvinfer->resizedFrames_surf->surfaceList->width =
    mem->surf->surfaceList->width; nvinfer->resizedFrames_surf->isContiguous =
    false;
    NvBufSurfaceMemSet(nvinfer->resizedFrames_surf, 0, 0, 0);
    err = NvBufSurfTransform(&nvinfer->tmp_surf, nvinfer->resizedFrames_surf, &nvinfer->transform_params);
    nvinfer->resizedFrames_surf->numFilled = nvinfer->tmp_surf.numFilled;
    if (err != NvBufSurfTransformError_Success) {
      GST_ELEMENT_ERROR(
          nvinfer, STREAM, FAILED,
          ("NvBufSurfTransform failed with error %d while converting buffer",
           err),
          (NULL));
      return FALSE;
    }
    save_transformed_plate_images(nvinfer->resizedFrames_surf);
    g_printf(
        "\n---------------->saved transformed plate images to disk prior to sgie "
        "detection");
  }
  nvtxDomainRangePop(nvinfer->nvtx_domain);
  LockGMutex locker (nvinfer->process_lock);
  /* Push the batch info structure in the processing queue and notify the output
   * thread that a new batch has been queued. */
  g_queue_push_tail (nvinfer->input_queue, batch);
  g_cond_broadcast (&nvinfer->process_cond);
  return TRUE;
}

dump_infer_input_to_file.patch.txt (8.8 KB)
Could you try attached change to dump the input just before calling TRT inference API - enqueue() ?

Thanks for the patch. When I use it, the images are displayed correctly. I will do a little more digging and as to why the results are flickering.

Any further update? Is this still an issue to support? Thanks

Thanks for the patch, it helped me debug my issue!

the sgie will receive cropped targets from original image(1280*720, 3000*2000, …)

higher resolution original image will get more pixels after cropping for the same target, so you will get smooth image in sgie when input 3000*2000 than 1280*720, I think it’s no business with interpolation mode when resizing.