NvBufSurfTransform failed with error -3 while converting buffer

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
6.2
• NVIDIA GPU Driver Version (valid for GPU only)
525.105.17
• Issue Type( questions, new requirements, bugs)
questions
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

My program has a primary_bin(nvinfer) before tracker, and several secondary detector/cliassfier after tracker. I got error occasionlly like this when I added more than 50 sources.

gstnvtracker: NvBufSurfTransform failed with error -3 while converting buffergstnvtracker: Failed to convert input batch.

Here’s my tracker configuration:

BaseConfig:
  minDetectorConfidence: 0   # If the confidence of a detector bbox is lower than this, then it won't be considered for tracking

TargetManagement:
  preserveStreamUpdateOrder: 0 # When assigning new target ids, preserve input streams' order to keep target ids in a deterministic order over multuple runs
  maxTargetsPerStream: 150  # Max number of targets to track per stream. Recommended to set >10. Note: this value should account for the targets being tracked in shadow mode as well. Max value depends on the GPU memory capacity

  # [Creation & Termination Policy]
  minIouDiff4NewTarget: 0.5   # If the IOU between the newly detected object and any of the existing targets is higher than this threshold, this newly detected object will be discarded.
  minTrackerConfidence: 0.2   # If the confidence of an object tracker is lower than this on the fly, then it will be tracked in shadow mode. Valid Range: [0.0, 1.0]
  probationAge: 0 # If the target's age exceeds this, the target will be considered to be valid.
  maxShadowTrackingAge: 30   # Max length of shadow tracking. If the shadowTrackingAge exceeds this limit, the tracker will be terminated.
  earlyTerminationAge: 1   # If the shadowTrackingAge reaches this threshold while in TENTATIVE period, the the target will be terminated prematurely.

TrajectoryManagement:
  useUniqueID: 1   # Use 64-bit long Unique ID when assignining tracker ID.

DataAssociator:
  dataAssociatorType: 0 # the type of data associator among { DEFAULT= 0 }
  associationMatcherType: 0 # the type of matching algorithm among { GREEDY=0, GLOBAL=1 }
  checkClassMatch: 1  # If checked, only the same-class objects are associated with each other. Default: true

  # Thresholds in matching scores to be considered as a valid candidate for matching
  minMatchingScore4Overall: 0.8   # Min total score
  minMatchingScore4SizeSimilarity: 0.6  # Min bbox size similarity score
  minMatchingScore4Iou: 0.0       # Min IOU score
  thresholdMahalanobis: 9.4877    # Max Mahalanobis distance based on Chi-square probabilities

StateEstimator:
  stateEstimatorType: 2  # the type of state estimator among { DUMMY=0, SIMPLE=1, REGULAR=2 }

  # [Dynamics Modeling]
  noiseWeightVar4Loc: 0.05  # weight of process and measurement noise for bbox center; if set, location noise will be proportional to box height
  noiseWeightVar4Vel: 0.00625  # weight of process and measurement noise for velocity; if set, velocity noise will be proportional to box height
  useAspectRatio: 1 # use aspect ratio in Kalman filter's observation

ReID:
  reidType: 1 # the type of reid among { DUMMY=0, DEEP=1 }
  batchSize: 128 # batch size of reid network
  workspaceSize: 1000 # workspace size to be used by reid engine, in MB
  reidFeatureSize: 128 # size of reid feature
  reidHistorySize: 100 # max number of reid features kept for one object
  inferDims: [128, 64, 3] # reid network input dimension CHW or HWC based on inputOrder
  inputOrder: 1 # reid network input order among { NCHW=0, NHWC=1 }
  colorFormat: 0 # reid network input color format among {RGB=0, BGR=1 }
  networkMode: 1 # reid network inference precision mode among {fp32=0, fp16=1, int8=2 }
  offsets: [2.1179039, 2.03571, 1.80444]  # array of values to be subtracted from each input channel, with length equal to number of channels
  netScaleFactor: 0.0174291938997821 # scaling factor for reid network input after substracting offsets
  inputBlobName: "input" # reid network input layer name
  outputBlobName: "output" # reid network output layer name
  #uffFile: "/opt/nvidia/deepstream/deepstream/samples/models/Tracker/mars-small128.uff" # absolute path to reid network uff model
  modelEngineFile: "myenginefile" # engine file path
  keepAspc: 0 # whether to keep aspc ratio when resizing input objects for reid

and other configurations set by code:

    g_object_set (G_OBJECT (m_tracker), "tracker-width", 640,
        "tracker-height", 384,
        "gpu-id", 0,
        "ll-lib-file", "/opt/nvidia/deepstream/deepstream/lib/libnvds_nvmultiobjecttracker.so",
        NULL);

    g_object_set (G_OBJECT (m_tracker), "enable-batch-process",
            1, NULL);

    g_object_set (G_OBJECT (m_tracker), "enable-past-frame",
            1, NULL);

    g_object_set (G_OBJECT (m_tracker), "display-tracking-id",
            1, NULL);
    
    g_object_set (G_OBJECT (m_tracker), "compute-hw",1,NULL);

I set “crop-objects-to-roi-boundary: 1” with primary_bin, so the object rects should be right.What else should I do?
Btw, which option should I set in tracker if I want to skip some specific class ids from the detector?

  1. what is the GPU device model?
  2. can you reproduce this issue based on deepstream sample, for example, you can modify source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.yml to support 50 sources and tracker.

why do you need to do this? for performance?

I’m using a T4 card. It’s very hard to produce since it may ocurr after running for several hours, so I’d like to know if there is any way that I can use to find out what is wrong? For example, is there some kind of width or height limits like it should be more than a specific value?

And for this question,

why do you need to do this? for performance?

Yes, I need to do this for better performance. For example, my pipeline is like this:
detector1->detector2->tracker
detector2 is based on detector1, and I don’t need the results from detector2 to pass through the tracker.

You can use detector1->tracker->detector2.

error code -3 means NvBufSurfTransformError_Invalid_Params, what are the 50 sources? rtsp source or the local files? which sample are you referring to?
first we suggest monitoring CPU and memory usage, please refer to memory usage

Rtsp stream,and cpu/memory/gpu memory were sufficient when it occured. Anyway, I’m considering developing my own traker now.

  1. could you share your whole media pipeline?
  2. there is a similiar issue , I suggest printing the object’s rectangle coordinates values, specifically, you can add a probe function on tracker’s sink, and print the object’s rectangle values to check if some values are negative. please refer to osd_sink_pad_buffer_probe of deepstream_test4_app.c in deepstream sdk.

I’m afraid my whole pipeline is too complex to share. And I dont think it was caused by invalid rect because as I said, I 've already set crop-objects-to-roi-boundary=1 in detector, and related source code in nvinfer:

void
attach_metadata_detector (GstNvInfer * nvinfer, GstMiniObject * tensor_out_object,
    GstNvInferFrame & frame, NvDsInferDetectionOutput & detection_output, float segmentationThreshold)
{
  ...
  if(nvinfer->crop_objects_to_roi_boundary){
      if(obj.top < 0)
        obj.top = 0;
      if(obj.left < 0)
        obj.left = 0;
      if (obj.top < filter_params.roiTopOffset)
        obj.top = filter_params.roiTopOffset;
      if (obj.left + obj.width >= frame.input_surf_params->width)
        obj.width = frame.input_surf_params->width - obj.left;
      if (obj.top + obj.height > (frame.input_surf_params->height - filter_params.roiBottomOffset))
        obj.height = frame.input_surf_params->height - filter_params.roiBottomOffset - obj.top;
    }
}

This should avoid negative values pushing into tracker.
Since it’s so hard to debug, I’ve used my own tracker now. I’ll keep in touch with you if I get anything new.

I think this problem is triggered when a large number of objects are pushed into nvtracker.You can try to reproduce this issue by these code:
testcode.zip (6.9 KB)
To reproduce, you may need a video that contains a lot of objects in one frame. sample_1080p_h264.mp4 has too few objects to cause this problem. The config files and engines all comes from nvcr.io/nvidia/deepstream:6.2-devel

Thanks for the update, I will try.

could you share how to compile and run? could you share the test video by private email? we don’t know how many objects are needed, Thanks!

Sorry for the inconvenience.I add a build.sh script in the new zip. You can try to build it in nvcr.io/nvidia/deepstream:6.2-devel. And for the video, it’s too big to post, you can make a video with 01.jpg using ffmpge, which is also in the zip file.

ffmpeg -loop 1 -i 01.jpg -c:v libx264 -t 600 -r 25 -pix_fmt yuv420p -vf "scale=1920:1080" -g 50 output.mp4

test.zip (5.1 MB)

  1. using the same code and video with yours, I can’t reproduce “NvBufSurfTransform failed with error -3” on T4+ ds6.2. here is the log
    log.txt (3.9 KB), especially there is an error “GPUassert: out of memory src/modules/ReID/ReIDFeatureGallery.cpp 228”.
  2. if I set 10 urls, there will be no error, here is the log
    log1.txt (7.0 KB)

I ran the test code on A10 card, when it occured, it took about 5000M GPU memory
. And USE_NEW_NVSTREAMMUX=“yes” was set. (I tried to run it without newstreammux, nothing happened.Sorry I missed this one). Here’s my log:
log.txt (5.4 KB)
Could you please try it again with the same condition?

Have you been reproduced it successfully? Is there anything else I can do?

did you use new nvstreammux? “export USE_NEW_NVSTREAMMUX=yes” will enable new nvstreammux.

Yes,I used new nvstreammux

Anything new?

  1. using new nvstreammux and the same code with yours, I still get “GPUassert: out of memory" on T4.
  2. based on the step 1, I modify 50 urls to 10 in main.cpp, I get a hung issue, here is the log:
    log613.txt (7.4 KB)

Could you try it on A10? 10 urls are not enough to produce sufficient objects.

I tested it on T4 and reproduced it successfully. Could you check if you have sufficient source first? GPU and CPU memory about 5G when it occured.