Poor performance due to INMSLayers followup nms_layer_output(1)[DevicetoShapeHostCopy] and trainstation

Description

I am using this repository as a reference to include output tensor parsing and clustering in the yolo segmentation model using onnx graphsurgeon GitHub - marcoslucianops/DeepStream-Yolo-Seg: NVIDIA DeepStream SDK 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 implementation for YOLO-Segmentation models . Now i am facing large latencies on a call after NonMaximumSuppression layer, called nms_layer_output(1)[DevicetoShapeHostCopy] and an added trainstation syncing with Host.

As stated in https://forums.developer.nvidia.com/t/inmslayer-cuda-graph-invalidation-devicetoshapehostcopy/338025/6 the INMSLayer is somehow synchronizing with host because of the dynamic output size I guess.

Is there a way to fix the dimension of INMSLayer and make it run without this synchronisation/copy to host?
Which setting is causing this? Misssing TopK, incorrect maxoutputBboxes?

https://docs.nvidia.com/deeplearning/tensorrt/latest/\_static/c-api/classnvinfer1_1_1_i_n_m_s_layer.html

Environment

TensorRT Version: 10.3
GPU Type: 4070 Ti Laptop
Nvidia Driver Version: 575
CUDA Version: 12.6
CUDNN Version: 9.3
Operating System + Version: Ubuntu 24.04 LTS
Python Version: 3.10
PyTorch Version: 2.6.0
Container: nvcr.io/nvidia/deepstream:7.1-triton-multiarch

I tried to create an app that takes care of adding INMSLayer and RoiAlign to have more control but i couldnt find any way to stop it from adding the two additional layers

After looking a bit more, it seems this has been an issue for a longer time and there is never an answer to this topic from nvidia.

as stated here and in my previous post Output-tensor-meta Access RAW model output with batch dimension i will keep going with NMS and RoiAlign outside the model with a custom postprocessor, until someone feels obligated to give an answer this issue.

@Fiona.Chen can you please make sure someone responsible for TensorRT Forums has a look at this? On Deepstream Forum topics support is great but here it seems to be wasteland…

maybe @fanzh can help out?