Failed to allocate cuda output buffer during context initialization

When converting the ONNX to TRT I see this in the output:

[07/25/2023-10:44:41] [I] Created input binding for image with dimensions 1x3x2176x3840
[07/25/2023-10:44:41] [I] Using random values for output scores
[07/25/2023-10:44:41] [I] Created output binding for scores with dimensions 4254264
[07/25/2023-10:44:41] [I] Using random values for output boxes
[07/25/2023-10:44:41] [I] Created output binding for boxes with dimensions 4254259x4
[07/25/2023-10:44:41] [I] Using random values for output labels
[07/25/2023-10:44:41] [I] Created output binding for labels with dimensions 4254259

So it looks like the size is correct when the engine is built. But then I see this when running it in deepstream:

0   INPUT  kFLOAT image           3x2176x3840     
1   OUTPUT kFLOAT scores          0               
2   OUTPUT kFLOAT boxes           4               
3   OUTPUT kINT32 labels          0   

Seems like a dimension is removed for some reason?