Failed to allocate cuda output buffer during context initialization

Hardware Platform: Jetson Xavier
Deepstream Version: 6.2
Jetpack Version: 5.1.1
TensorRT Version: 8.5.2
Issue Type: Question

Trying to run a retinanet TensorRT engine with the deepstream-app, but seeing the following error:

0:00:06.531483429 54380 0xaaaae4e45f20 ERROR                nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::allocateBuffers() <nvdsinfer_context_impl.cpp:1437> [UID = 1]: Failed to allocate cuda output buffer during context initialization
0:00:06.531575401 54380 0xaaaae4e45f20 ERROR                nvinfer gstnvinfer.cpp:674:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::initialize() <nvdsinfer_context_impl.cpp:1289> [UID = 1]: Failed to allocate buffers

Is there something I should check for or more information about what is going wrong here?

Is it related to the sizes being 0 for some of the outputs?

INFO: [Implicit Engine Info]: layers num: 4
0   INPUT  kFLOAT image           3x2176x3840     
1   OUTPUT kFLOAT scores          0               
2   OUTPUT kFLOAT boxes           4               
3   OUTPUT kINT32 labels          0               

I am not sure why this is the case, the output sizes are static in the onnx file before I build the tensorRT engine.
image

  1. can the mode be run successfully by thirdparty tool?
  2. nvinfer plugin is opensource, you can find this error in NvDsInferContextImpl::allocateBuffers of \opt\nvidia\deepstream\deepstream-6.2\sources\libs\nvdsinfer\nvdsinfer_context_impl.cpp, and you can add log check if size is zero when allocate buffer. especially please rebuild the code /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvinfer/ and replace the old so /opt/nvidia/deepstream/deepstream/lib/gst-plugins/libnvdsgst_infer.so.
  1. I am able to run the model using python code modified from https://github.com/NVIDIA/TensorRT/blob/release/8.6/samples/python/introductory_parser_samples/onnx_resnet50.py and get good output
  2. I am seeing where that is being caught now. It seems the layerInfo.inferDims.numElements is 0. Do you know why this is the case? Inspecting the onnx model used for TRT conversion shows static sizes

When converting the ONNX to TRT I see this in the output:

[07/25/2023-10:44:41] [I] Created input binding for image with dimensions 1x3x2176x3840
[07/25/2023-10:44:41] [I] Using random values for output scores
[07/25/2023-10:44:41] [I] Created output binding for scores with dimensions 4254264
[07/25/2023-10:44:41] [I] Using random values for output boxes
[07/25/2023-10:44:41] [I] Created output binding for boxes with dimensions 4254259x4
[07/25/2023-10:44:41] [I] Using random values for output labels
[07/25/2023-10:44:41] [I] Created output binding for labels with dimensions 4254259

So it looks like the size is correct when the engine is built. But then I see this when running it in deepstream:

0   INPUT  kFLOAT image           3x2176x3840     
1   OUTPUT kFLOAT scores          0               
2   OUTPUT kFLOAT boxes           4               
3   OUTPUT kINT32 labels          0   

Seems like a dimension is removed for some reason?

I was able to resolve this by adding an extra dimension to the scores, boxes, and labels in pytorch.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.