Custom ONNX model, TensorRT Engine with PyCUDA in Deepstream

anushamanila9 · January 25, 2023, 1:52pm

• Hardware Platform (Jetson / GPU) dGPU (Tesla T4)
• DeepStream Version 6.1.1
• TensorRT Version 8.4.1.5
• NVIDIA GPU Driver Version (valid for GPU only) 515.65.01
• Issue Type( questions, new requirements, bugs) Question

Using the ONNX model from WoodScape/omnidet at master · valeoai/WoodScape · GitHub

Note: I have TensorRT engine of the above ONNX model which works in TensorRT framework with PyCUDA. The same engine file (with NHWC layout) can’t run on Deepstream pipeline due to NCHW memory layout requirement of Deepstream. I tried to convert the engine file to NCHW with no luck.

It would be easier if I can reuse my inference code using PyCUDA inside Deepstream Pipeline.
Do you have any reference document for using PYCUDA with Deepstream?
I saw this but it is not of much help

Fiona.Chen · January 30, 2023, 3:34am

There is “network-input-order” parameter with nvinfer configuration. network-input-order, you don’t need to convert your model from NCHW to NHWC. You also need to input “infer-dims” parameter to give the correct input dimensions of your model with gst-nvinfer. Please read the document carefully.

gst-nvinfer is TensorRT based which is CUDA accelerated already.

anushamanila9 · January 30, 2023, 2:40pm

Hi Fiona,
Thanks for your response.
I’ve seen the plugin “nvinfer” but using that plugin for our custom model will require implementing custom inference method. Since we have PyCUDA based inference script for our custom model, I was wondering if I could reuse that script for a quick check in Deepstream.

Fiona.Chen · January 30, 2023, 2:50pm

Can you tell us what kind of custom inference method do you need?

anushamanila9 · January 30, 2023, 4:09pm

Below is the method I’m using for inference and I’m looking for equivalent functionality in Deepstream

When I had used NVInfer with tensorRT engine file, I had seen the issue below

Similar to the issue seen here “RGB/BGR input format specified but network input channels is not 3”

Fiona.Chen · January 31, 2023, 1:26am

Seems you need consecutive several frames as the model input, right?

Please refer to deepstream-3d-action-recognition sample in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition, this is also a model with consecutive frames inputs.

anushamanila9 · January 31, 2023, 7:19am

Ok, I will check it out.
But my initial question was that, is replacing nvinfer with PyCuda inference possible in Deepstream? Let me know.

Fiona.Chen · January 31, 2023, 7:29am

DeepStream is NvBufSurface batch based. If the PyCuda inference plugin can handle NvMetaBatch and NvBufSurface correctly, it can work with other DeepStream plugins. NVIDIA DeepStream SDK API Reference: NvBufSurface Types and Functions | NVIDIA Docs

anushamanila9 · January 31, 2023, 7:50am

ok, I’ll read the docs and try to understand the details you mentioned.

Also, second question is,
Even after setting the “network-input-order” and “infer-dims”, I’m seeing the error “RGB/BGR input format specified but network input channels is not 3”.
This means, Deepstream Doesn’t automatically support NHWC layout. I need to to add custom function to convert input frame from NCHW to NHWC right?
And if you could provide an example to do so, it would be great.

Fiona.Chen · January 31, 2023, 7:53am

Your model is multiple input layers model. The default gst-nvinfer just accept single input layer model. So please change your model to single input layer model and refer to the sample of deepstream-3d-action-recognition sample in /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream-3d-action-recognition, this is also a model with consecutive frames inputs.

anushamanila9 · January 31, 2023, 8:13am

Thanks Fiona, I’ll check it out and try the way you have mentioned.

Fiona.Chen · February 2, 2023, 6:12am

Can you elaborate the details about your model? The input layer is the image in HWC format. What are the input layers “i_encoder_0”, “i_encoder_1”, …? How did you generate the inputs to them?

anushamanila9 · February 2, 2023, 9:16am

Yes, the first input is the video frame.
The other inputs(i_encoder_0 ,…, i_encoder_4) are initialized with zeros and then on the next iteration, the previous frame’s encoder outputs(o_encoder_0, …, o_encoder_4) are assigned to each of the inputs(i_encoder_0 ,…, i_encoder_4) of the current frame respectively.

For me as a beginner, its challenging to integrate this model to Deepstream Pipeline

anushamanila9 · February 17, 2023, 6:11pm

Even after setting this parameter to 1 (NHWC), I’m getting the same error as before:

 nvinfer gstnvinfer.cpp:640:gst_nvinfer_logger:<primary-inference> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::preparePreprocess() <nvdsinfer_context_impl.cpp:971> [UID = 1]: RGB/BGR input format specified but network input channels is not 3

The parameter ‘network-input-order’ seems to be ignored even though ‘input-tensor-meta’ is not enabled.
Any suggestions on how to proceed?

Fiona.Chen · February 20, 2023, 8:31am

gst-nvinfer does not support multiple input layers. Please convert your model to single input layer model. Or you have to modify the gst-nvinfer plugin source code to add the functions of generating the data for all input layers.

What are the input and how did you get them and process them?

anushamanila9 · February 20, 2023, 8:19pm

Below is the Main code

if __name__ == "__main__":
    # Read the yaml file
    config_params = collect_tupperware()
    inference = FishNeurAllTRT(config_params)

    assert os.path.isdir(config_params['fisheye_dir']), f"Cannot find folder {config_params['fisheye_dir']}"
    print(f"=> Extracting test sequence from {config_params['fisheye_dir']}")
    image_list = sorted(glob.glob(os.path.join(config_params['fisheye_dir'], '*.jpg')))
   
    # Create output dir
    os.makedirs(inference.output_dir,  exist_ok=True)

    # Memory allocation
    stream = cudatrt.get_cuda_stream()
    inputs, outputs, bindings, _ = cudatrt.allocate_buffers(inference.trt_engine, False)

    # Start inference loop
    infer_time = 0
    pre_time = 0
    post_time = 0

    dtype = np.float16

    with inference.trt_engine.create_execution_context() as context:
        #context.profiler = trt.Profiler()
        print("Number of images: ", len(image_list))
        for idx,  frame_path in enumerate(image_list):
            start_pre_op = time.time()
            inference.pre_image_op_nosiamese(inputs, frame_path, dtype=dtype)
            stop_pre_op = time.time()
            pre_time += stop_pre_op - start_pre_op
            start_infer = time.time()
            cudatrt.infer_trt_engine_nosiamese(inputs, outputs, inference.max_batch_size, bindings, stream, context)
            stream.synchronize()
......

Below are the sub modules based on PyCUDA

def allocate_buffers(trt_engine, pipelining=False):
    """
        :param trt_engine: trt_engine
        :param pipelining: if True output_buffers are duplicated for pipeling
        :return: the list of allocated input_buffers and output_buffers + the list of bindings

        :Description:

        From the input trt_engine, input/ouput buffers are allocated and the bindings defined
        if pipeling is enable and tmp output buffers are allocated
    """
    input_buffers = []
    output_buffers = []
    output_buffers_tmp = []  # For pipelining
    bindings = []
    for binding in trt_engine:
        print("Binding: ", binding)
        size = trt.volume(trt_engine.get_binding_shape(binding))
        dtype = trt.nptype(trt_engine.get_binding_dtype(binding))
        # Allocate host and device buffers
        host_mem = cuda.pagelocked_empty(size, dtype)

        device_mem = cuda.mem_alloc(host_mem.nbytes)
        # Append the device buffer to device bindings.
        bindings.append(int(device_mem))
        # Append to the appropriate list.
        if trt_engine.binding_is_input(binding):
            input_buffers.append(HostDeviceMem(host_mem, device_mem))
            print('input engine.get_binding_dtype(binding)', trt_engine.get_binding_dtype(binding))
            print('input engine.get_binding_shape(binding)', trt_engine.get_binding_shape(binding))
        else:
            output_buffers.append(HostDeviceMem(host_mem, device_mem))
            print('output engine.get_binding_dtype(binding)', trt_engine.get_binding_dtype(binding))
            print('output engine.get_binding_shape(binding)', trt_engine.get_binding_shape(binding))
            if pipelining:
                output_buffers_tmp.append(np.empty(trt_engine.get_binding_shape(binding), dtype=dtype))

    print("bindings", bindings)
    return input_buffers, output_buffers, bindings, output_buffers_tmp

def infer_trt_engine_nosiamese(inputs, outputs, max_batch_size, bindings, stream, context):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs[0:1]]
    # Run inference.
    # context.profiler = trt.Profiler()
    context.execute_async(batch_size=max_batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer encoder frame N to input encoder frame N+1
    [cuda.memcpy_dtod_async(inputs[i+1].device, outputs[i].device, outputs[i].host.nbytes, stream) for i in range(0, 5)]
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs[5:]]

Fiona.Chen · April 20, 2023, 4:58am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

Your model seems to be a LSTM model, can you try to use Triton Inference Server: Triton Inference Server | NVIDIA Developer and Gst-nvinferserver — DeepStream 6.2 Release documentation

It is better to provide your model for us for further investigation.

system · May 19, 2023, 2:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Some PyTorch model with slicing operation fails on inference TensorRT tensorrt , pytorch , onnx , deepstream	2	1466	January 7, 2022
ResNet50 classifier model as pgie DeepStream SDK	21	1014	November 16, 2023
Issue with Converting ONNX Model with different dimensions to TensorRT Engine for DeepStream DeepStream SDK deepstream	22	164	May 23, 2025
Unable to parse custom pytorch UNET onnx model with python deepstream-segmentation-app DeepStream SDK onnx , segmentation , deepstream61	9	1199	August 16, 2022
Deploy custom object detection tf2 model DeepStream SDK	4	1095	January 4, 2022
Failed to used TensorRT Engine file in deepstream DeepStream SDK	16	2769	October 12, 2021
Error importing model engine in deepstream TensorRT	5	968	December 12, 2022
How to generate a tensorrt model that is supported by Deesptream sdk DeepStream SDK	17	564	January 29, 2024
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1099	December 13, 2022
Issues running Onnx classifier model in deepstream DeepStream SDK tensorrt , onnx	5	1678	October 12, 2021

Custom ONNX model, TensorRT Engine with PyCUDA in Deepstream

Related topics