Issues we face when using triton ensemble model through grpc call

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.0
• JetPack Version (valid for Jetson only) NA
• TensorRT Version 8.0.1-1+cuda11.3
• NVIDIA GPU Driver Version (valid for GPU only) Yes
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

We are using standalone triton server (docker image: nvcr.io/nvidia/tritonserver:21.08-py3) and running deepstream in container with image (nvcr.io/nvidia/deepstream:6.0-triton). The ensemble model is using dynamic batch. Everything works if I set (max_batch_size: 0). If I set it to any other value I am getting a segmentation fault. I have attached the logs for Triton Server and Deepstream nvinferserver (kernel message). Kindly help me to narrow down the issue.

Deepstream App Error logs (Kernel space logs):

[346098.830193] GstInferServImp[45300]: segfault at 7f51bb331ce0 ip 00007f534a1ad730 sp 00007f5375fec608 error 4 in libcuda.so.510.54[7f5349fba000+13cb000]

Triton Server logs:

I0328 07:26:30.053641 1 infer_response.cc:165] add response output: output: BBOX_OUTPUT, type: FP32, shape: [4,8400,85]
I0328 07:26:30.053709 1 pinned_memory_manager.cc:161] pinned memory allocation: size 11424000, addr 0x7f842c000090
I0328 07:26:30.053740 1 ensemble_scheduler.cc:511] Internal response allocation: BBOX_OUTPUT, size 11424000, addr 0x7f842c000090, memory type 1, type id 0

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Can you reproduce this issue using other model with dynamic batch?
also can you share the model to have a local try?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.