Build TensorRT on Cuda compute capability 7.5 and make it backward compatible with previous capabilities

peeranat85 · May 19, 2022, 8:39am

Description

I’d like to make TensorRT engine file work across different compute capabilities. I’ve found that we can build Cuda application to be backward compatible across different compute capabilities. See this link.
With this knowledge, I thought it might be possible to do the same for TensorRT engine file by building trtexec tool with multiple architectures support. However, after following the build instructions from TensorRT github repo, the build requires a prebuilt package from Nvidia Developer Zone that contains libnvinfer which is not generated from the source code from Github. Due to this, it seems that making the engine file compatible across compute capabilities is not possible? If it is, is there any other approach?

This is how I verified

download model from here
clone TensorRT from Github repo and download the prebuilt package for the build.
build with multiple architectures enabled. (GPU_ARCHS is not defined. Generating CUDA code for default SMs: 53;60;61;70;75).
use trtexec tool to convert some onnx model on NVIDIA GeForce GTX 1650 Ti (compute capability 7.5)

trtexec  --onnx=mobilenetv2-7.onnx --workspace=64 --fp16 --explicitBatch  --saveEngine=mobilenetv2.engine

execute the built engine file on a different machine with NVIDIA GeForce GTX 1050 Ti (compute capability 6.1)

~/TensorRT-8.2.1.8/bin/trtexec --loadEngine=mobilenetv2.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /home/mle/TensorRT-8.2.1.8/bin/trtexec --loadEngine=mobilenetv2.engine
[05/19/2022-09:40:27] [I] === Model Options ===
[05/19/2022-09:40:27] [I] Format: *
[05/19/2022-09:40:27] [I] Model: 
[05/19/2022-09:40:27] [I] Output:
[05/19/2022-09:40:27] [I] === Build Options ===
[05/19/2022-09:40:27] [I] Max batch: 1
[05/19/2022-09:40:27] [I] Workspace: 16 MiB
[05/19/2022-09:40:27] [I] minTiming: 1
[05/19/2022-09:40:27] [I] avgTiming: 8
[05/19/2022-09:40:27] [I] Precision: FP32
[05/19/2022-09:40:27] [I] Calibration: 
[05/19/2022-09:40:27] [I] Refit: Disabled
[05/19/2022-09:40:27] [I] Sparsity: Disabled
[05/19/2022-09:40:27] [I] Safe mode: Disabled
[05/19/2022-09:40:27] [I] DirectIO mode: Disabled
[05/19/2022-09:40:27] [I] Restricted mode: Disabled
[05/19/2022-09:40:27] [I] Save engine: 
[05/19/2022-09:40:27] [I] Load engine: mobilenetv2.engine
[05/19/2022-09:40:27] [I] Profiling verbosity: 0
[05/19/2022-09:40:27] [I] Tactic sources: Using default tactic sources
[05/19/2022-09:40:27] [I] timingCacheMode: local
[05/19/2022-09:40:27] [I] timingCacheFile: 
[05/19/2022-09:40:27] [I] Input(s)s format: fp32:CHW
[05/19/2022-09:40:27] [I] Output(s)s format: fp32:CHW
[05/19/2022-09:40:27] [I] Input build shapes: model
[05/19/2022-09:40:27] [I] Input calibration shapes: model
[05/19/2022-09:40:27] [I] === System Options ===
[05/19/2022-09:40:27] [I] Device: 0
[05/19/2022-09:40:27] [I] DLACore: 
[05/19/2022-09:40:27] [I] Plugins:
[05/19/2022-09:40:27] [I] === Inference Options ===
[05/19/2022-09:40:27] [I] Batch: 1
[05/19/2022-09:40:27] [I] Input inference shapes: model
[05/19/2022-09:40:27] [I] Iterations: 10
[05/19/2022-09:40:27] [I] Duration: 3s (+ 200ms warm up)
[05/19/2022-09:40:27] [I] Sleep time: 0ms
[05/19/2022-09:40:27] [I] Idle time: 0ms
[05/19/2022-09:40:27] [I] Streams: 1
[05/19/2022-09:40:27] [I] ExposeDMA: Disabled
[05/19/2022-09:40:27] [I] Data transfers: Enabled
[05/19/2022-09:40:27] [I] Spin-wait: Disabled
[05/19/2022-09:40:27] [I] Multithreading: Disabled
[05/19/2022-09:40:27] [I] CUDA Graph: Disabled
[05/19/2022-09:40:27] [I] Separate profiling: Disabled
[05/19/2022-09:40:27] [I] Time Deserialize: Disabled
[05/19/2022-09:40:27] [I] Time Refit: Disabled
[05/19/2022-09:40:27] [I] Skip inference: Disabled
[05/19/2022-09:40:27] [I] Inputs:
[05/19/2022-09:40:27] [I] === Reporting Options ===
[05/19/2022-09:40:27] [I] Verbose: Disabled
[05/19/2022-09:40:27] [I] Averages: 10 inferences
[05/19/2022-09:40:27] [I] Percentile: 99
[05/19/2022-09:40:27] [I] Dump refittable layers:Disabled
[05/19/2022-09:40:27] [I] Dump output: Disabled
[05/19/2022-09:40:27] [I] Profile: Disabled
[05/19/2022-09:40:27] [I] Export timing to JSON file: 
[05/19/2022-09:40:27] [I] Export output to JSON file: 
[05/19/2022-09:40:27] [I] Export profile to JSON file: 
[05/19/2022-09:40:27] [I] 
[05/19/2022-09:40:27] [I] === Device Information ===
[05/19/2022-09:40:27] [I] Selected Device: NVIDIA GeForce GTX 1050 Ti
[05/19/2022-09:40:27] [I] Compute Capability: 6.1
[05/19/2022-09:40:27] [I] SMs: 6
[05/19/2022-09:40:27] [I] Compute Clock Rate: 1.4175 GHz
[05/19/2022-09:40:27] [I] Device Global Memory: 4040 MiB
[05/19/2022-09:40:27] [I] Shared Memory per SM: 96 KiB
[05/19/2022-09:40:27] [I] Memory Bus Width: 128 bits (ECC disabled)
[05/19/2022-09:40:27] [I] Memory Clock Rate: 3.504 GHz
[05/19/2022-09:40:27] [I] 
[05/19/2022-09:40:27] [I] TensorRT version: 8.2.1
[05/19/2022-09:40:27] [I] [TRT] [MemUsageChange] Init CUDA: CPU +158, GPU +0, now: CPU 169, GPU 116 (MiB)
[05/19/2022-09:40:27] [I] [TRT] Loaded engine size: 7 MiB
[05/19/2022-09:40:27] [E] Error[6]: The engine plan file is generated on an incompatible device, expecting compute 6.1 got compute 7.5, please rebuild.
[05/19/2022-09:40:27] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
[05/19/2022-09:40:27] [E] Failed to create engine from model.
[05/19/2022-09:40:27] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # /home/mle/TensorRT-8.2.1.8/bin/trtexec --loadEngine=mobilenetv2.engine

Environment

TensorRT Version: 8.2.1.8
GPU Type: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
Nvidia Driver Version: 495.29.05
CUDA Version: 10.2
CUDNN Version: 8.2
Operating System + Version: Ubuntu 20.10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

NVES · May 19, 2022, 9:07am

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

peeranat85 · May 19, 2022, 9:37am

The model can be found here models/mobilenetv2-7.onnx at main · onnx/models · GitHub

spolisetty · May 19, 2022, 12:32pm

Hi,

The generated engine files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be rebuilt on the specific GPU in case you want to run them on a different GPU.
Please refer to the below link for the same.

Thanks!

system · June 2, 2022, 12:32pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can I convert onnx model to engine model supporting a GPU with different compute capability on another pc TensorRT tensorrt , onnx , gpu	7	5249	November 2, 2023
TRT]: INVALID_CONFIG: The engine plan file is generated on an incompatible device, expecting compute 8.6 got compute 7.5, please rebuild TensorRT	2	6289	August 22, 2023
Problem loading TRT engine plan on another machine. TensorRT	4	6271	October 12, 2021
Outputs of tensorrt are too different according to the compute capabilities TensorRT	1	435	November 2, 2022
[TRT] [E] 6: The engine plan file is generated on an incompatible device, expecting compute 8.6 got compute 8.9, please rebuild TensorRT tensorrt , cudnn	1	647	June 10, 2024
Question regarding Tensorrt engine build vs inference environment (TensorRT version, Platform, etc) TensorRT	4	912	October 21, 2021
How can we build a tensorRT model just once and run on different GPUs? TensorRT	3	415	May 5, 2020
The engine plan file is generated on an incompatible device expecting compute 6.1 got compute 8.6,please rebuild TensorRT	3	4292	April 19, 2024
Bug : Tensorrt Model not loading on same GPU on a different device (slight driver version difference) TensorRT tensorrt , cudnn	1	243	April 30, 2024
Upgrade Tensorrt engine file Jetson AGX Xavier tensorrt	2	670	September 5, 2021

Build TensorRT on Cuda compute capability 7.5 and make it backward compatible with previous capabilities

Description

Environment

check_model.py

Related topics