TensorRT Inference error on Jetson nano

p.vahidinia · November 28, 2021, 8:56am

Hi.
I exported a .tlt model and generated trt.engine on jetson nano. I get following error for inference:

[TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)

AastaLLL · November 29, 2021, 3:24am

Hi,

Could you share more information about your use case?
Which model do you use? And please share the executed command with us as well.

Thanks.

p.vahidinia · November 29, 2021, 7:19pm

Hi.
I use the yolov4 model and with this command, I generate the trt.engine:

./tao-converter -k $KEY -p Input,1x3x416x416,8x3x416x416,16x3x416x416 -d 3,416,416 -o BatchedNMS -i nchw -m 1 -c /home/jetsonuser/jp4.6/yolov4_export/cal.bin -e /home/jetsonuser/jp4.6/trt.engine -b 2 -t int8 -w 1073741824 /home/jetsonuser/jp4.6/yolov4_export/final_model.etlt

p.vahidinia · November 29, 2021, 7:28pm

Inference code:

import tensorrt as trt
import numpy as np
from PIL import Image
import os
import cv2
import pycuda.driver as cuda
import pycuda.autoinit



class HostDeviceMem(object):
    def __init__(self, host_mem, device_mem):
        self.host = host_mem
        self.device = device_mem

    def __str__(self):
        return "Host:\n" + str(self.host) + "\nDevice:\n" + str(self.device)

    def __repr__(self):
        return self.__str__()

class TrtEngine:
    #Initializes TensorRT objects needed for model inference.
    def __init__(self,engine_path, input_height, input_width, input_channels, max_batch_size, dtype):
        
        self.engine_path = engine_path
        self.input_height = input_height
        self.input_width = input_width
        self.input_channels = input_channels
        self.dtype = dtype
        self.logger = trt.Logger(trt.Logger.VERBOSE)
        self.runtime = trt.Runtime(self.logger)
        self.engine = self.load_engine(self.runtime, self.engine_path)
        self.max_batch_size = max_batch_size
        self.inputs, self.outputs, self.bindings, self.stream = self.allocate_buffers()
        self.context = self.engine.create_execution_context()
        # Allocate memory for multiple usage [e.g. multiple batch inference]
        # self.context.set_binding_shape(0, (self.max_batch_size , 3, self.input_height, self.input_width))
        input_volume = trt.volume((self.input_channels, self.input_width, self.input_height))
        self.numpy_array = np.zeros((self.engine.max_batch_size, input_volume))

                
                
    @staticmethod
    def load_engine(trt_runtime, engine_path):
        trt.init_libnvinfer_plugins(None, "")             
        with open(engine_path, 'rb') as f:
            engine_data = f.read()
        engine = trt_runtime.deserialize_cuda_engine(engine_data)
        return engine
    
    def allocate_buffers(self):
        
        inputs = []
        outputs = []
        bindings = []
        stream = cuda.Stream()
        for binding in self.engine:
            size = trt.volume(self.engine.get_binding_shape(binding)) * -1
            host_mem = cuda.pagelocked_empty(size, self.dtype)
            device_mem = cuda.mem_alloc(host_mem.nbytes)
            
            bindings.append(int(device_mem))

            if self.engine.binding_is_input(binding):
                inputs.append(HostDeviceMem(host_mem, device_mem))
            else:
                outputs.append(HostDeviceMem(host_mem, device_mem))
        
        return inputs, outputs, bindings, stream
       
            
    def infer_batch(self, image_paths):
        """Infers model on batch of same sized images resized to fit the model.
        Args:
            image_paths (str): paths to images, that will be packed into batch
                and fed into model
        """

        # Verify if the supplied batch size is not too big
        max_batch_size = self.engine.max_batch_size
        actual_batch_size = len(image_paths)
        if actual_batch_size > max_batch_size:
            raise ValueError(
                "image_paths list bigger ({}) than engine max batch size ({})".format(actual_batch_size, max_batch_size))


        # Load all images to CPU...
        imgs = self._load_imgs(image_paths)
        # ...copy them into appropriate place into memory...
        # (self.inputs was returned earlier by allocate_buffers())

        # print("check for more than 1 image")
        # print(len(self.inputs))
        np.copyto(self.inputs[0].host, imgs.ravel().astype(self.dtype))
        
        # ...fetch model outputs...
        input_shape = (1,3,self.img_height,self.img_width)
        self.context.set_binding_shape(0, input_shape)
        # [detection_out, keep_count_out] = do_inference(
        #     context=self.context, bindings=self.bindings, inputs=self.inputs,
        #     outputs=self.outputs, stream=self.stream)
        # # ...and return results.
        # return detection_out, keep_count_out
        outputs = do_inference(
            context=self.context, bindings=self.bindings, inputs=self.inputs,
            outputs=self.outputs, stream=self.stream)

        # ...and return results.
        return outputs

    def _load_image_into_numpy_array(self, image):
        # (im_width, im_height) = image.size
        # return np.array(image).reshape(
        #     (im_height, im_width, self.input_channels)
        # ).astype(np.uint8)

        return np.array(image, dtype=self.dtype, order='C')

    def _load_imgs(self, image_paths):
        # batch_size = self.engine.max_batch_size
        for idx, image_path in enumerate(image_paths):
            img_np = self._load_img(image_path)
            self.numpy_array[idx] = img_np
        return self.numpy_array

    def _load_img(self, image_path):
        image = Image.open(image_path)
        r, g, b = image.split()              
        image = Image.merge('RGB', (b, g, r))

        model_input_width = self.input_width
        model_input_height = self.input_width
        # Note: Bilinear interpolation used by Pillow is a little bit
        # different than the one used by Tensorflow, so if network receives
        # an image that is not 300x300, the network output may differ
        # from the one output by Tensorflow
        image_resized = image.resize(
            size=(model_input_width, model_input_height),
            resample=Image.BICUBIC
        )
        img_np = self._load_image_into_numpy_array(image_resized)
        # HWC -> CHW
        img_np = img_np.transpose((2, 0, 1))
        # Normalize to [-1.0, 1.0] interval (expected by model)
        img_np = img_np / 255.0
        # img_np = (2.0 / 255.0) * img_np - 1.0
        img_np = img_np.ravel()
        return img_np

# This function is generalized for multiple inputs/outputs.
# inputs and outputs are expected to be lists of HostDeviceMem objects.
def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
    # Transfer input data to the GPU.
    [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
    # Run inference.
    context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
    # Transfer predictions back from the GPU.
    [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
    # Synchronize the stream
    stream.synchronize()

    # Return only the host outputs.
    return [out.host for out in outputs]

AastaLLL · November 30, 2021, 6:25am

Hi,

Could you share the TLT model with us as well?
Thanks.

p.vahidinia · November 30, 2021, 8:33pm

File size is large and can not be uploaded here. This is the link to download the model file:

p.vahidinia · December 3, 2021, 8:26am

Sorry, could you please help me?

p.vahidinia · December 5, 2021, 11:21am

@AastaLLL

AastaLLL · December 6, 2021, 2:57am

Hi,

Sorry for the delay.

We are checking this internally.
Will share more information with you later.

Thanks.

p.vahidinia · December 6, 2021, 7:18am

Hi
Could it be because the board is legacy and I have to use an older jetpack version? I used jetpack 4.6

randhar · December 10, 2021, 6:30pm

Hello,
I am having the similar error when I run the inference on the engine file.
The engine file is for object detection model ‘ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’.

Error:
[TensorRT] ERROR: 2: [pluginV2DynamicExtRunner.cpp::execute::115] Error Code 2: Internal Error (Assertion status == kSTATUS_SUCCESS failed.)

Versions which I use:
TensorRT version: 8.0.1.6
onnx version: 1.10.2
Jetpack: 4.6

I have uploaded my onnx and trt engine file for the model and the script I am using for inference.

model.trt (12.4 MB)
model.onnx (10.4 MB)
inference.py (5.2 KB)

Please have a look.
Thank you.

AastaLLL · December 13, 2021, 8:02am

Hi,

It seems that you tried to convert an INT8 TensorRT engine on Nano.

Please note that INT8 operation is only supported on the Xavier series currently.
Could you try to convert an fp32 or fp16 to see if it works?

Thanks

AastaLLL · December 13, 2021, 8:03am

Hi, randhar

Since you are not using a TLT model, the cause should be different.
Would you mind filing a new topic specified for your issue?

Thanks.

p.vahidinia · December 13, 2021, 11:08am

Hi,
yes I understand but I get the same error with fp32 and fp16.

randhar · December 13, 2021, 2:52pm

Sure. Thanks

AastaLLL · December 14, 2021, 9:46am

Hi,

We want to reproduce this issue internally.
Could you share the $KEY value with us?

Thanks.

p.vahidinia · December 15, 2021, 4:27am

Sure,
KEY = ‘bnJ0OG1xcHVrb3N2MGU5b21nZHR2a3ZrMXI6NTkzYjE3YjAtNzllNy00MTk3LTkyNmUtNmJhM2QxNTAyOGEw’

AastaLLL · December 15, 2021, 8:25am

Hi,

We try the tao-converter with the key shared above but meet some error.
It seem that the key is incorrect. Could you double-check it?

$ ./tao-converter -k $KEY -p Input,1x3x416x416,8x3x416x416,16x3x416x416 -d 3,416,416 -o BatchedNMS -i nchw -m 16 -e trt.engine -w 1073741824 final_model.tlt
[INFO] [MemUsageChange] Init CUDA: CPU +353, GPU +0, now: CPU 614, GPU 17956 (MiB)
[ERROR] UffParser: Could not parse MetaGraph from /tmp/filebHgqqX
[ERROR] Failed to parse the model, please check the encoding key to make sure it's correct
[ERROR] 4: [network.cpp::validate::2411] Error Code 4: Internal Error (Network must have at least one output)
[ERROR] Unable to create engine
Segmentation fault (core dumped)

Thanks.

p.vahidinia · December 15, 2021, 8:34am

Hi.
I am sure the $KEY is correct. I think you should write final_model.etlt instead of final_model.tlt
This is my command:

./tao-converter -k $KEY -p Input,1x3x416x416,8x3x416x416,16x3x416x416 -d 3,416,416 -o BatchedNMS -i nchw -m 1  -e /home/jetsonuser/jp4.6/trt.engine  -t fp32 -w 1073741824 /home/jetsonuser/jp4.6/yolov4_export/final_model.etlt

AastaLLL · December 15, 2021, 8:41am

Hi,

final_model.tlt is the file name downloaded from the previous comment:

Do you use the same file?

Thanks.

Topic		Replies	Views
TensorRT Inference error on Jetson nano TensorRT	3	1250	December 6, 2021
Tlt-convert for custom trained YoloV4 model failed on Jetson Nano 4G TAO Toolkit	42	2653	August 27, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	4033	April 20, 2022
TRT engine returns nan on jetson nano Jetson Nano tensorrt	7	663	January 31, 2023
Inference error while using tensorrt engine on jetson nano TensorRT tensorrt	3	1844	February 1, 2022
ONNX Model Inference on Jetson Nano - Segmentation fault Jetson Nano tensorrt , jetson-inference	8	1521	October 15, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1520	July 12, 2022
How to infer using tensorRT on jetson nano? Jetson Nano tensorrt , deep-learning	4	1125	October 15, 2021
Problem converting tensorflow model to TensorRT Jetson Nano tensorrt , tensorflow	5	478	March 26, 2024
Cannot Create Mobilenet SSD TRT Engine on Jetson Nano \| [ERROR] UffParser: Unsupported number of graph 0 TAO Toolkit tensorrt	9	753	October 12, 2021

TensorRT Inference error on Jetson nano

Related topics