Assertion `buf->getTotalBytes() >= bytes' failed python backend

I’m trying to build a face-detection-with-landmark based on SCRFD model with deepstream-triton. The pipeline of ensemble models is:
Preprocess (Python: resize, normalize, transpose) → Infer SCRFD (TensorRT with custom NMS plugins) → Post-process (Python: rescale, crop face, normalize) → Face Embedding
The ensemble model work well with triton-inference-server, however, when run with deepstream-triton 6.0, i got this error: Assertion `buf->getTotalBytes() >= bytes’. I’ve found that the error raised when i tried to return dynamic shape tensors by python backend in Post-process stage.

Config Post-process

name: “post_scrfd_nms”
backend: “python”
max_batch_size : 1

input [
{
name: “num_detections”
data_type: TYPE_INT32
dims: [1]
},
{
name: “nmsed_boxes”
data_type: TYPE_FP32
dims: [200, 4]
},
{
name: “nmsed_scores”
data_type: TYPE_FP32
dims: [200]
},
{
name: “nmsed_classes”
data_type: TYPE_FP32
dims: [200]
},
{
name: “nmsed_landmarks”
data_type: TYPE_FP32
dims: [200, 10]
},
{
name: “original_image”
data_type: TYPE_FP32
dims: [3, 1080, 1920]
}

]
output [
{
name: “res_num_detections”
data_type: TYPE_INT32
dims: [1]
},
{
name: “res_bboxes”
data_type: TYPE_FP32
dims: [-1, 4]
},
{
name: “res_scores”
data_type: TYPE_FP32
dims: [-1]
},
{
name: “res_landmarks”
data_type: TYPE_FP32
dims: [-1, 5, 2]
},
{
name: “res_face_align”
data_type: TYPE_FP32
dims: [-1, 3, 112, 112]
}
]

parameters: {
key: “EXECUTION_ENV_PATH”,
value: {string_value: “/deepstream/triton/envs/face_align_38.tar.gz”}
}

Setup:
• GPU: NVIDIA A100
• Container: nvcr.io/nvidia/deepstream:6.0-triton
Note: The batchedNMSCustomPlugin is built by add a landmarks array to original batchedNMSPlugin
Git: GitHub - NNDam/deepstream-face-detection: Face detection with deepstream triton

For dynamic batch with nvinferserver, please refer to YOLOV4- DS-TRITON | Configuration specified max-batch 4 but TensorRT engine only supports max-batch 1 - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

Hi @Fiona.Chen,
After change max_batch_size = 0 and add dynamic batch dimension
I still have problem with dynamic shape python backend (when static shape work ok).
For example (batch-size = 1):
Config dynamic shape (first dimension is batch-size)
(-1, 1), (-1, -1, 4), (-1, -1), (-1, -1, 5, 2), (-1, -1, 3, 112, 112)
When first request return shape with dynamic-dimension=4
(1, 1), (1, 4, 4), (1, 4), (1, 4, 5, 2), (1, 4, 3, 112, 112)
The second & third request return shape with dynamic-dimension < 4
(1, 1), (1, 2, 4), (1, 2), (1, 2, 5, 2), (1, 2, 3, 112, 112) → OK
(1, 1), (1, 3, 4), (1, 3), (1, 3, 5, 2), (1, 3, 3, 112, 112) → OK
If request return shape with dynamic-dimension > 4, the error will be raised

Another example:

(1, 1) (1, 9, 4) (1, 9) (1, 9, 5, 2) (1, 9, 3, 112, 112) # First request
Decodebin child added: source

Decodebin child added: decodebin0

Decodebin child added: qtdemux0

Decodebin child added: multiqueue0

Decodebin child added: h264parse0

Decodebin child added: capsfilter0

Decodebin child added: nvv4l2decoder0

In cb_newpad

gstname= video/x-raw
features= <Gst.CapsFeatures object at 0x7fa274316fa0 (GstCapsFeatures at 0x7f9f2807c260)>
(1, 1) (1, 7, 4) (1, 7) (1, 7, 5, 2) (1, 7, 3, 112, 112)
(1, 1) (1, 5, 4) (1, 5) (1, 5, 5, 2) (1, 5, 3, 112, 112)
(1, 1) (1, 7, 4) (1, 7) (1, 7, 5, 2) (1, 7, 3, 112, 112)
(1, 1) (1, 8, 4) (1, 8) (1, 8, 5, 2) (1, 8, 3, 112, 112)
(1, 1) (1, 10, 4) (1, 10) (1, 10, 5, 2) (1, 10, 3, 112, 112)
python3: infer_trtis_backend.cpp:243: nvdsinferserver::SharedBatchBuf nvdsinferserver::TrtISBackend::allocateResponseBuf(const string&, size_t, nvdsinferserver::InferMemType, int64_t): Assertion `buf->getTotalBytes() >= bytes’ failed.