YOLO SGIE problem with resize

geralt_of_rivia · September 30, 2020, 1:38pm

I’m using back to back detectors with YoloV4, where the first detector detects a board and second detector detects the characters within the board. I’m using enable-padding=0 and maintain-aspect-ratio=0 (I tried maintaining aspect ratio, but the results were worse) What I’m observing is pgie->sgie detects the boards, but doesn’t detect the characters, but when I manually create a video with only boards, and use my previous sgie as pgie, it works fine. Following are some more observations:

Input video dimension: 1280x720

Muxer dimension: 1280x720
pgie: works
sgie: doesn't work

Muxer dimension: 1920x1080
pgie: works
sgie: works half the time

Muxer dimension: 3000x2000
pgie: works
sgie: works 

Muxer dimension: 1280x720 
using sgie as pgie: works

In the end, (both of) my YoloV4 input dimension is 608x608 so how is it that if I resize to 608x608 from 1280x720 doesn’t work but from 3000x2000 works? Sometimes when my board is smaller in a video, I have to further increase the muxer dimension to something ridiculous like 5000x8000 otherwise it doesn’t work. I have also read gst-nvinfer code but everything looks normal. Looking for some suggestions here!

• Hardware Platform: T4
• DeepStream Version: 5.0
• TensorRT Version: 7.0
• NVIDIA GPU Driver Version: 440.33.01
• Issue Type: question/bug

geralt_of_rivia · October 5, 2020, 5:35am

Any update on this post?

mchi · October 7, 2020, 5:50pm

Hi @geralt_of_rivia,
When training your YoloV4, did you do the same preprocessing to the training images, i.e. maintain-aspect-ratio=0?

Thanks!

geralt_of_rivia · October 7, 2020, 5:54pm

No, while training I didn’t set maintain-aspect-ratio=0, it was set to 1

mchi · October 8, 2020, 8:10am

so, you need to use maintain-aspect-ratio=1 as well in inference.
And, in DeepStream, since classfication gie inference the detected object, so there is crop, resize to the size of the network input, with maintain-aspect-ratio=1, there is padding, currently, the padding is in the bottom or in the right of the image like below, so you need to have the same preprocessing for the training to generate the model.

geralt_of_rivia · October 9, 2020, 12:04pm

Thanks for your response. It turns out that in yolo training you need to set letter-box=1 to maintain aspect ratio. I didn’t set it, so my in my training, I’m not maintaining aspect ratio (I thought it did by default, but my understanding was not correct)

I have written some custom code in gstnvinfer.cpp to save the image right before the sgie does inference, and have following observation:

maintain-aspect-ratio=0, input-resolution=1280x720

img-48608×608 108 KB
maintain-aspect-ratio=0, input-resolution=3000x2000

img-158-2608×608 93.8 KB
maintain-aspect-ratio=1, input-resolution=3000x2000

img-158608×608 57.7 KB

I have two key observations here:

Firstly, it seems there’s something wrong with the transformation and resize function, there is some random noise being added onto the initial and latter part of the image.
Secondly, the image that was resized from 3000x2000 to 608x608 is much more clear than the one resized from 1280x720 to 608x608 this explains why the performance of the model was very good on the 3000x2000 image but not on the one with 1280x720, you can see there are some notable differences in the image like the lines are much more smooth in the second image, while in the first they are really jagged.

So I have two questions:

Is there a way to control what algorithm is used to resize?
What might be the cause of the noise being added to the image, how to resolve it?

Thanks

mchi · October 9, 2020, 1:34pm

“scaling-filter” can be used to defined the scaling algorightm provided in enum NvBufSurfTransform_Inter in nvbufsurftransform.h.
By default, it’s NvBufSurfTransformInter_Default, you could try others, e.g. NvBufSurfTransformInter_Algo2

/**
 * Specifies video interpolation methods.
 */
typedef enum
{
  /** Specifies Nearest Interpolation Method interpolation. */
  NvBufSurfTransformInter_Nearest = 0,
  /** Specifies Bilinear Interpolation Method interpolation. */
  NvBufSurfTransformInter_Bilinear,
  /** Specifies GPU-Cubic, VIC-5 Tap interpolation. */
  NvBufSurfTransformInter_Algo1,
  /** Specifies GPU-Super, VIC-10 Tap interpolation. */
  NvBufSurfTransformInter_Algo2,
  /** Specifies GPU-Lanzos, VIC-Smart interpolation. */
  NvBufSurfTransformInter_Algo3,
  /** Specifies GPU-Ignored, VIC-Nicest interpolation. */
  NvBufSurfTransformInter_Algo4,
  /** Specifies GPU-Nearest, VIC-Nearest interpolation. */
  NvBufSurfTransformInter_Default
} NvBufSurfTransform_Inter;

geralt_of_rivia · October 9, 2020, 1:42pm

Alright thanks, I will try that out. What about the issue (2) where there is noise around the image? What explains that?

mchi · October 9, 2020, 2:06pm

Not sure for now!
As you can see, the image I shared above was dumpped just before TRT infer, there is not noise.

geralt_of_rivia · October 9, 2020, 2:25pm

But in my case, it is coming out from pgie. So the output I have shared it before sgie inference. Any idea on what might be the root cause? Something that might go wrong between pgie output and sgie input? What can I do to diagnose the issue?

geralt_of_rivia · October 10, 2020, 2:43pm

This is the code I wrote to save the images, can you review it? Maybe it might have some errors. nvinfer->resizedFrames_surf is a custom surface that I created to store the transformed surface. I created it in the following way:

 NvBufSurfaceCreateParams create_params;
 create_params.gpuId = nvinfer->gpu_id;
 create_params.width = 608;
 create_params.height = 608;
 create_params.size = 0; 
 create_params.colorFormat = NVBUF_COLOR_FORMAT_RGBA;
 create_params.layout = NVBUF_LAYOUT_PITCH;
 create_params.memType = NVBUF_MEM_CUDA_UNIFIED;
 
 //create surface for holding 
 if (NvBufSurfaceCreate(&nvinfer->resizedFrames_surf, 1, &create_params) != 0) {
 g_printf("\nError: Could not allocate internal buffer for dsexample");
 return false;
 }

And then I wrote these two functions to extract the image and save it to disk

static void save_transformed_plate_images(NvBufSurface * surface) {
  /* Map the buffer so that it can be accessed by CPU */
  if (NvBufSurfaceMap(surface, 0, 0, NVBUF_MAP_READ) != 0) {
    g_printf("\nunable to map intermediate surface");
    return;
  }
  for (uint frameIndex = 0; frameIndex < surface->numFilled;
       frameIndex++) {
    NvBufSurfaceSyncForCpu(surface, frameIndex, 0);
    cv::Mat rgbFrame = cv::Mat(
        cv::Size(surface->surfaceList[frameIndex].width,
                 surface->surfaceList[frameIndex].height),
        CV_8UC3);
    cv::Mat *rgbaFrame = new cv::Mat(
        surface->surfaceList[frameIndex].height,
        surface->surfaceList[frameIndex].width, CV_8UC4,
        surface->surfaceList[frameIndex].mappedAddr.addr[0],
        surface->surfaceList[frameIndex].pitch);
#if (CV_MAJOR_VERSION >= 4)
    cv::cvtColor(*rgbaFrame, rgbFrame, cv::COLOR_RGBA2BGR);
#else
    cv::cvtColor(*rgbaFrame, rgbFrame, CV_RGBA2BGR);
#endif
    ++i;
    std::string saveLocation =
        "../plates/img_" + std::to_string(i) + std::string(".jpg");
    cv::imwrite(saveLocation, rgbFrame);
  }
  if (NvBufSurfaceUnMap(surface, 0, 0) != 0) {
    g_printf("\nunable to map intermediate surface");
    return;
  }
}
static gboolean
convert_batch_and_push_to_input_thread (GstNvInfer *nvinfer,
    GstNvInferBatch *batch, GstNvInferMemory *mem)
{
  NvBufSurfTransform_Error err = NvBufSurfTransformError_Success;
  std::string nvtx_str;
  /* Set the transform session parameters for the conversions executed in this
   * thread. */
  err = NvBufSurfTransformSetSessionParams (&nvinfer->transform_config_params);
  if (err != NvBufSurfTransformError_Success) {
    GST_ELEMENT_ERROR (nvinfer, STREAM, FAILED,
        ("NvBufSurfTransformSetSessionParams failed with error %d", err), (NULL));
    return FALSE;
  }
  nvtxEventAttributes_t eventAttrib = {0};
  eventAttrib.version = NVTX_VERSION;
  eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;
  eventAttrib.colorType = NVTX_COLOR_ARGB;
  eventAttrib.color = 0xFFFF0000;
  eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;
  nvtx_str = "convert_buf batch_num=" + std::to_string(nvinfer->current_batch_num);
  eventAttrib.message.ascii = nvtx_str.c_str();
  nvtxDomainRangePushEx(nvinfer->nvtx_domain, &eventAttrib);  
  if (batch->frames.size() > 0) {    
    /* Batched tranformation. */
    err = NvBufSurfTransform(&nvinfer->tmp_surf, mem->surf,
                             &nvinfer->transform_params);
  }
  
  if (err != NvBufSurfTransformError_Success) {
    GST_ELEMENT_ERROR (nvinfer, STREAM, FAILED,
        ("NvBufSurfTransform failed with error %d while converting buffer", err),
        (NULL));
    return FALSE;
  }
  
  // save transformed plate images to disk  
  // save plates if operating in secondary mode
   if (err == NvBufSurfTransformError_Success && !nvinfer->process_full_frame
   &&
      batch->frames.size() > 0) {
    nvinfer->resizedFrames_surf->surfaceList->dataSize =
    mem->surf->surfaceList->dataSize;
    nvinfer->resizedFrames_surf->surfaceList->layout =
    mem->surf->surfaceList->layout;
    nvinfer->resizedFrames_surf->surfaceList->pitch =
    mem->surf->surfaceList->pitch;
    nvinfer->resizedFrames_surf->surfaceList->planeParams =
    mem->surf->surfaceList->planeParams;
    nvinfer->resizedFrames_surf->surfaceList->bufferDesc =
    mem->surf->surfaceList->bufferDesc;
    nvinfer->resizedFrames_surf->surfaceList->height =
    mem->surf->surfaceList->height;
    nvinfer->resizedFrames_surf->surfaceList->width =
    mem->surf->surfaceList->width; nvinfer->resizedFrames_surf->isContiguous =
    false;
    NvBufSurfaceMemSet(nvinfer->resizedFrames_surf, 0, 0, 0);
    err = NvBufSurfTransform(&nvinfer->tmp_surf, nvinfer->resizedFrames_surf, &nvinfer->transform_params);
    nvinfer->resizedFrames_surf->numFilled = nvinfer->tmp_surf.numFilled;
    if (err != NvBufSurfTransformError_Success) {
      GST_ELEMENT_ERROR(
          nvinfer, STREAM, FAILED,
          ("NvBufSurfTransform failed with error %d while converting buffer",
           err),
          (NULL));
      return FALSE;
    }
    save_transformed_plate_images(nvinfer->resizedFrames_surf);
    g_printf(
        "\n---------------->saved transformed plate images to disk prior to sgie "
        "detection");
  }
  nvtxDomainRangePop(nvinfer->nvtx_domain);
  LockGMutex locker (nvinfer->process_lock);
  /* Push the batch info structure in the processing queue and notify the output
   * thread that a new batch has been queued. */
  g_queue_push_tail (nvinfer->input_queue, batch);
  g_cond_broadcast (&nvinfer->process_cond);
  return TRUE;
}

mchi · October 11, 2020, 9:31am

dump_infer_input_to_file.patch.txt (8.8 KB)
Could you try attached change to dump the input just before calling TRT inference API - enqueue() ?

geralt_of_rivia · October 14, 2020, 8:42am

Thanks for the patch. When I use it, the images are displayed correctly. I will do a little more digging and as to why the results are flickering.

kayccc · October 20, 2020, 2:58am

Any further update? Is this still an issue to support? Thanks

geralt_of_rivia · October 25, 2020, 9:30am

Thanks for the patch, it helped me debug my issue!

zhouzhi9 · February 19, 2021, 8:05am

the sgie will receive cropped targets from original image(1280*720, 3000*2000, …)

higher resolution original image will get more pixels after cropping for the same target, so you will get smooth image in sgie when input 3000*2000 than 1280*720, I think it’s no business with interpolation mode when resizing.

Topic		Replies	Views
Inference results different when scaling is executed on NvStreammux DeepStream SDK deepstream	5	54	February 11, 2025
Custom Yolov8n-face and FER Model Integration into Deepstream DeepStream SDK tensorrt , cuda , tensorflow , ubuntu , gstreamer , docker , python , deepstream	66	149	January 16, 2025
Detector1 --> cropped images --> detector 2 Application cascading in the latest back-to-back DeepStream SDK nvbugs	21	1463	October 12, 2021
Sgie inference does not work on all detected objects DeepStream SDK cuda , ubuntu , gstreamer , python	12	1874	November 9, 2021
Sgie custom preprocessing DeepStream SDK	15	2672	October 12, 2021
Unable to start Yolo8 deepstream with MJpeg AVI DeepStream SDK jetson-inference , gstreamer	14	863	September 8, 2023
No sgie metadata for some pgie detections using pyds DeepStream SDK gstreamer , python	17	1461	October 12, 2021
SGIE doesnt give the inference for all detected objects DeepStream SDK ubuntu , python	18	800	May 21, 2024
How to run custom detection and segmentation models DeepStream SDK	17	811	January 30, 2024
Deepstream save detected image to disk DeepStream SDK jetson-inference , deepstream	3	741	September 21, 2023

YOLO SGIE problem with resize

Related topics