Low FPS on Jetson Nano using TensorRT

nannigalaxy · August 22, 2020, 4:21am

Hi,

I recently got my hands on Jetson Nano and deployed simple image classification which I created in keras with only 3 classes. I followed this blog to convert to tensorrt with FP32 precision.
I did inference using webcam, model loaded in approx. 14 secs and got avg. 7.5 FPS utilizing 1.5GB ram.
I also did inference using Tensorflow model, model loaded in approx. 4 mins and also for this I had to increase swapfile size to 6GB in order to meet it’s memory demand after utilizing it’s 4GB ram memory or else process would get killed. This tf model was giving avg. 17 FPS

The question is why TensorRT is not giving better FPS as it optimized, am I missing something?

Thanks

AastaLLL · August 24, 2020, 3:03am

Hi,

It’s recommended to test your model with trtexec to see the optimal performance first.
Suppose you have generated the onnx format from the blog shared above, then please try these command:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model]
$ /usr/src/tensorrt/bin/trtexec --onnx=[your/model] --fp16

Thanks.

nannigalaxy · August 24, 2020, 10:33am

Hi,

Here is the output

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=cnn3.onnx
    [08/24/2020-15:57:12] [I] === Model Options ===
    [08/24/2020-15:57:12] [I] Format: ONNX
    [08/24/2020-15:57:12] [I] Model: cnn3.onnx
    [08/24/2020-15:57:12] [I] Output:
    [08/24/2020-15:57:12] [I] === Build Options ===
    [08/24/2020-15:57:12] [I] Max batch: 1
    [08/24/2020-15:57:12] [I] Workspace: 16 MB
    [08/24/2020-15:57:12] [I] minTiming: 1
    [08/24/2020-15:57:12] [I] avgTiming: 8
    [08/24/2020-15:57:12] [I] Precision: FP32
    [08/24/2020-15:57:12] [I] Calibration: 
    [08/24/2020-15:57:12] [I] Safe mode: Disabled
    [08/24/2020-15:57:12] [I] Save engine: 
    [08/24/2020-15:57:12] [I] Load engine: 
    [08/24/2020-15:57:12] [I] Builder Cache: Enabled
    [08/24/2020-15:57:12] [I] NVTX verbosity: 0
    [08/24/2020-15:57:12] [I] Inputs format: fp32:CHW
    [08/24/2020-15:57:12] [I] Outputs format: fp32:CHW
    [08/24/2020-15:57:12] [I] Input build shapes: model
    [08/24/2020-15:57:12] [I] Input calibration shapes: model
    [08/24/2020-15:57:12] [I] === System Options ===
    [08/24/2020-15:57:12] [I] Device: 0
    [08/24/2020-15:57:12] [I] DLACore: 
    [08/24/2020-15:57:12] [I] Plugins:
    [08/24/2020-15:57:12] [I] === Inference Options ===
    [08/24/2020-15:57:12] [I] Batch: 1
    [08/24/2020-15:57:12] [I] Input inference shapes: model
    [08/24/2020-15:57:12] [I] Iterations: 10
    [08/24/2020-15:57:12] [I] Duration: 3s (+ 200ms warm up)
    [08/24/2020-15:57:12] [I] Sleep time: 0ms
    [08/24/2020-15:57:12] [I] Streams: 1
    [08/24/2020-15:57:12] [I] ExposeDMA: Disabled
    [08/24/2020-15:57:12] [I] Spin-wait: Disabled
    [08/24/2020-15:57:12] [I] Multithreading: Disabled
    [08/24/2020-15:57:12] [I] CUDA Graph: Disabled
    [08/24/2020-15:57:12] [I] Skip inference: Disabled
    [08/24/2020-15:57:12] [I] Inputs:
    [08/24/2020-15:57:12] [I] === Reporting Options ===
    [08/24/2020-15:57:12] [I] Verbose: Disabled
    [08/24/2020-15:57:12] [I] Averages: 10 inferences
    [08/24/2020-15:57:12] [I] Percentile: 99
    [08/24/2020-15:57:12] [I] Dump output: Disabled
    [08/24/2020-15:57:12] [I] Profile: Disabled
    [08/24/2020-15:57:12] [I] Export timing to JSON file: 
    [08/24/2020-15:57:12] [I] Export output to JSON file: 
    [08/24/2020-15:57:12] [I] Export profile to JSON file: 
    [08/24/2020-15:57:12] [I] 

Input filename:   cnn3.onnx
ONNX IR version:  0.0.4
Opset version:    8
Producer name:    tf2onnx
Producer version: 1.6.3
Domain:           
Model version:    0
Doc string:       

[08/24/2020-15:57:17] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/24/2020-15:57:19] [E] [TRT] Network has dynamic or shape inputs, but no optimization profile has been defined.
[08/24/2020-15:57:19] [E] [TRT] Network validation failed.
[08/24/2020-15:57:19] [E] Engine creation failed
[08/24/2020-15:57:19] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=cnn3.onnx

Thanks

AastaLLL · August 25, 2020, 4:53am

Hi,

May I know how do you run the TensorRT inference of the original post?
It looks like your model is using the dynamic shape, is that correct?

Thanks.

nannigalaxy · August 25, 2020, 5:13pm

Hi,

Here is the inference code

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt_runtime = trt.Runtime(TRT_LOGGER)

serialized_plan_fp32 = "model/cnn_model.plan"
# image size used in training 
img_size = 228
HEIGHT = img_size
WIDTH = img_size

engine = eng.load_engine(trt_runtime, serialized_plan_fp32)
h_input, d_input, h_output, d_output, stream = inf.allocate_buffers(engine, 1, trt.float32)

vs = cv2.VideoCapture(0) 

while True:
	_, frame = vs.read()
	image = process_img(frame)

	pred = inf.do_inference(engine, image, h_input, d_input, h_output, d_output, stream, 1, HEIGHT, WIDTH)
		
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF
 
	if key == ord("q"):
		break

cv2.destroyAllWindows()

To be honest I don’t know what dynamic shape is. Deep learning is still relatively new to me, I’m currently pursuing UG.

Thanks for your time.

AastaLLL · August 26, 2020, 8:19am

Hi,

Could you share the source of do_inference?
If this is implemented in a library, would you mind to share which module do you import with us?

Thanks.

nannigalaxy · August 26, 2020, 8:33am

do_inference code is same as in above mentioned blog.

def do_inference(engine, pics_1, h_input_1, d_input_1, h_output, d_output, stream, batch_size, height, width):
   """
   This is the function to run the inference
   Args:
      engine : Path to the TensorRT engine 
      pics_1 : Input images to the model.  
      h_input_1: Input in the host         
      d_input_1: Input in the device 
      h_output_1: Output in the host 
      d_output_1: Output in the device 
      stream: CUDA stream
      batch_size : Batch size for execution time
      height: Height of the output image
      width: Width of the output image
   
   Output:
      The list of output images

   """

   load_images_to_buffer(pics_1, h_input_1)

   with engine.create_execution_context() as context:
       # Transfer input data to the GPU.
       cuda.memcpy_htod_async(d_input_1, h_input_1, stream)

       # Run inference.

       context.profiler = trt.Profiler()
       context.execute(batch_size=1, bindings=[int(d_input_1), int(d_output)])

       # Transfer predictions back from the GPU.
       cuda.memcpy_dtoh_async(h_output, d_output, stream)
       # Synchronize the stream
       stream.synchronize()
       # Return the host output.
       out = h_output
       return out

AastaLLL · August 27, 2020, 7:11am

Hi,

Sorry for the missing.

It looks like you already have an engine file model/cnn_model.plan.
So you can benchmark the inference with following command directly:

/usr/src/tensorrt/bin/trtexec --loadEngine=cnn_model.plan

Thanks.

Topic		Replies	Views
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference TensorRT tensorrt , jetson-inference , jetson-nano	1	906	March 13, 2023
Human pose detection model (MoveNet) TensorRT conversion on NVIDIA Jetson Jetson Xavier NX tensorrt , tensorflow , jetson-inference	7	2573	June 16, 2022
Jetson-Inference predictions differ from e.g. tensorflow predictions Jetson Nano jetson-inference	4	860	November 17, 2021
I do not get any performance improvement after using TensorRT provider for object detection model Jetson Nano tensorrt , onnx	7	1396	July 12, 2022
Pose Estimation Runs Extremely Slow on Nano DeepStream SDK jetson-inference	6	701	October 12, 2021
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference.How can i do that TensorRT tensorrt , cuda , jetson-nano	3	752	March 13, 2023
About trtexec Jetson Nano tensorrt	2	3379	October 15, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3565	April 20, 2022
converting a frozen graph to tensorRT Jetson Nano	5	1786	October 14, 2021
TensorRT Inference error on Jetson nano TensorRT	3	1185	December 6, 2021

Low FPS on Jetson Nano using TensorRT

Related topics