Custom ONNX model, TensorRT Engine with PyCUDA in Deepstream

• Hardware Platform (Jetson / GPU) dGPU (Tesla T4)
• DeepStream Version 6.1.1
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only) 515.65.01
• Issue Type( questions, new requirements, bugs) Question

Using the ONNX model from WoodScape/omnidet at master · valeoai/WoodScape · GitHub

Note: I have TensorRT engine of the above ONNX model which works in TensorRT framework with PyCUDA. The same engine file (with NHWC layout) can’t run on Deepstream pipeline due to NCHW memory layout requirement of Deepstream. I tried to convert the engine file to NCHW with no luck.

It would be easier if I can reuse my inference code using PyCUDA inside Deepstream Pipeline.
Do you have any reference document for using PYCUDA with Deepstream?
I saw this but it is not of much help

There is “network-input-order” parameter with nvinfer configuration. network-input-order, you don’t need to convert your model from NCHW to NHWC. You also need to input “infer-dims” parameter to give the correct input dimensions of your model with gst-nvinfer. Please read the document carefully.

gst-nvinfer is TensorRT based which is CUDA accelerated already.

Hi Fiona,
Thanks for your response.
I’ve seen the plugin “nvinfer” but using that plugin for our custom model will require implementing custom inference method. Since we have PyCUDA based inference script for our custom model, I was wondering if I could reuse that script for a quick check in Deepstream.

Can you tell us what kind of custom inference method do you need?

Below is the method I’m using for inference and I’m looking for equivalent functionality in Deepstream

When I had used NVInfer with tensorRT engine file, I had seen the issue below

Similar to the issue seen here “RGB/BGR input format specified but network input channels is not 3”

Seems you need consecutive several frames as the model input, right?

Please refer to deepstream-3d-action-recognition sample in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition, this is also a model with consecutive frames inputs.

Ok, I will check it out.
But my initial question was that, is replacing nvinfer with PyCuda inference possible in Deepstream? Let me know.

DeepStream is NvBufSurface batch based. If the PyCuda inference plugin can handle NvMetaBatch and NvBufSurface correctly, it can work with other DeepStream plugins. NVIDIA DeepStream SDK API Reference: NvBufSurface Types and Functions

ok, I’ll read the docs and try to understand the details you mentioned.

Also, second question is,
Even after setting the “network-input-order” and “infer-dims”, I’m seeing the error “RGB/BGR input format specified but network input channels is not 3”.
This means, Deepstream Doesn’t automatically support NHWC layout. I need to to add custom function to convert input frame from NCHW to NHWC right?
And if you could provide an example to do so, it would be great.

Your model is multiple input layers model. The default gst-nvinfer just accept single input layer model. So please change your model to single input layer model and refer to the sample of deepstream-3d-action-recognition sample in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition, this is also a model with consecutive frames inputs.

1 Like

Thanks Fiona, I’ll check it out and try the way you have mentioned.

Can you elaborate the details about your model? The input layer is the image in HWC format. What are the input layers “i_encoder_0”, “i_encoder_1”, …? How did you generate the inputs to them?

Yes, the first input is the video frame.
The other inputs(i_encoder_0 ,…, i_encoder_4) are initialized with zeros and then on the next iteration, the previous frame’s encoder outputs(o_encoder_0, …, o_encoder_4) are assigned to each of the inputs(i_encoder_0 ,…, i_encoder_4) of the current frame respectively.

For me as a beginner, its challenging to integrate this model to Deepstream Pipeline

Even after setting this parameter to 1 (NHWC), I’m getting the same error as before:

 nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::preparePreprocess() <nvdsinfer_context_impl.cpp:971> [UID = 1]: RGB/BGR input format specified but network input channels is not 3

The parameter ‘network-input-order’ seems to be ignored even though ‘input-tensor-meta’ is not enabled.
Any suggestions on how to proceed?

gst-nvinfer does not support multiple input layers. Please convert your model to single input layer model. Or you have to modify the gst-nvinfer plugin source code to add the functions of generating the data for all input layers.

What are the input and how did you get them and process them?

Below is the Main code

if __name__ == "__main__":
    # Read the yaml file
    config_params = collect_tupperware()
    inference = FishNeurAllTRT(config_params)

    assert os.path.isdir(config_params['fisheye_dir']), f"Cannot find folder {config_params['fisheye_dir']}"
    print(f"=> Extracting test sequence from {config_params['fisheye_dir']}")
    image_list = sorted(glob.glob(os.path.join(config_params['fisheye_dir'], '*.jpg')))
    # Create output dir
    os.makedirs(inference.output_dir,  exist_ok=True)

    # Memory allocation
    stream = cudatrt.get_cuda_stream()
    inputs, outputs, bindings, _ = cudatrt.allocate_buffers(inference.trt_engine, False)

    # Start inference loop
    infer_time = 0
    pre_time = 0
    post_time = 0

    dtype = np.float16

    with inference.trt_engine.create_execution_context() as context:
        #context.profiler = trt.Profiler()
        print("Number of images: ", len(image_list))
        for idx,  frame_path in enumerate(image_list):
            start_pre_op = time.time()
            inference.pre_image_op_nosiamese(inputs, frame_path, dtype=dtype)
            stop_pre_op = time.time()
            pre_time += stop_pre_op - start_pre_op
            start_infer = time.time()
            cudatrt.infer_trt_engine_nosiamese(inputs, outputs, inference.max_batch_size, bindings, stream, context)

Below are the sub modules based on PyCUDA

def allocate_buffers(trt_engine, pipelining=False):
        :param trt_engine: trt_engine
        :param pipelining: if True output_buffers are duplicated for pipeling
        :return: the list of allocated input_buffers and output_buffers + the list of bindings


        From the input trt_engine, input/ouput buffers are allocated and the bindings defined
        if pipeling is enable and tmp output buffers are allocated
    input_buffers = []
    output_buffers = []
    output_buffers_tmp = []  # For pipelining
    bindings = []
    for binding in trt_engine:
        print("Binding: ", binding)
        size = trt.volume(trt_engine.get_binding_shape(binding))
        dtype = trt.nptype(trt_engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)

        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        # Append to the appropriate list.
        if trt_engine.binding_is_input(binding):
            input_buffers.append(HostDeviceMem(host_mem, device_mem))
            print('input engine.get_binding_dtype(binding)', trt_engine.get_binding_dtype(binding))
            print('input engine.get_binding_shape(binding)', trt_engine.get_binding_shape(binding))
            output_buffers.append(HostDeviceMem(host_mem, device_mem))
            print('output engine.get_binding_dtype(binding)', trt_engine.get_binding_dtype(binding))
            print('output engine.get_binding_shape(binding)', trt_engine.get_binding_shape(binding))
            if pipelining:
                output_buffers_tmp.append(np.empty(trt_engine.get_binding_shape(binding), dtype=dtype))

    print("bindings", bindings)
    return input_buffers, output_buffers, bindings, output_buffers_tmp

def infer_trt_engine_nosiamese(inputs, outputs, max_batch_size, bindings, stream, context):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device,, stream) for inp in inputs[0:1]]
    # Run inference.
    # context.profiler = trt.Profiler()
    context.execute_async(batch_size=max_batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer encoder frame N to input encoder frame N+1
    [cuda.memcpy_dtod_async(inputs[i+1].device, outputs[i].device, outputs[i].host.nbytes, stream) for i in range(0, 5)]
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(, out.device, stream) for out in outputs[5:]]

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Your model seems to be a LSTM model, can you try to use Triton Inference Server: Triton Inference Server | NVIDIA Developer and Gst-nvinferserver — DeepStream 6.2 Release documentation

It is better to provide your model for us for further investigation.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.