Build TensorRT on Cuda compute capability 7.5 and make it backward compatible with previous capabilities

Description

I’d like to make TensorRT engine file work across different compute capabilities. I’ve found that we can build Cuda application to be backward compatible across different compute capabilities. See this link.
With this knowledge, I thought it might be possible to do the same for TensorRT engine file by building trtexec tool with multiple architectures support. However, after following the build instructions from TensorRT github repo, the build requires a prebuilt package from Nvidia Developer Zone that contains libnvinfer which is not generated from the source code from Github. Due to this, it seems that making the engine file compatible across compute capabilities is not possible? If it is, is there any other approach?

This is how I verified

  1. download model from here
  2. clone TensorRT from Github repo and download the prebuilt package for the build.
  3. build with multiple architectures enabled. (GPU_ARCHS is not defined. Generating CUDA code for default SMs: 53;60;61;70;75).
  4. use trtexec tool to convert some onnx model on NVIDIA GeForce GTX 1650 Ti (compute capability 7.5)
trtexec  --onnx=mobilenetv2-7.onnx --workspace=64 --fp16 --explicitBatch  --saveEngine=mobilenetv2.engine
  1. execute the built engine file on a different machine with NVIDIA GeForce GTX 1050 Ti (compute capability 6.1)
~/TensorRT-8.2.1.8/bin/trtexec --loadEngine=mobilenetv2.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8201] # /home/mle/TensorRT-8.2.1.8/bin/trtexec --loadEngine=mobilenetv2.engine
[05/19/2022-09:40:27] [I] === Model Options ===
[05/19/2022-09:40:27] [I] Format: *
[05/19/2022-09:40:27] [I] Model: 
[05/19/2022-09:40:27] [I] Output:
[05/19/2022-09:40:27] [I] === Build Options ===
[05/19/2022-09:40:27] [I] Max batch: 1
[05/19/2022-09:40:27] [I] Workspace: 16 MiB
[05/19/2022-09:40:27] [I] minTiming: 1
[05/19/2022-09:40:27] [I] avgTiming: 8
[05/19/2022-09:40:27] [I] Precision: FP32
[05/19/2022-09:40:27] [I] Calibration: 
[05/19/2022-09:40:27] [I] Refit: Disabled
[05/19/2022-09:40:27] [I] Sparsity: Disabled
[05/19/2022-09:40:27] [I] Safe mode: Disabled
[05/19/2022-09:40:27] [I] DirectIO mode: Disabled
[05/19/2022-09:40:27] [I] Restricted mode: Disabled
[05/19/2022-09:40:27] [I] Save engine: 
[05/19/2022-09:40:27] [I] Load engine: mobilenetv2.engine
[05/19/2022-09:40:27] [I] Profiling verbosity: 0
[05/19/2022-09:40:27] [I] Tactic sources: Using default tactic sources
[05/19/2022-09:40:27] [I] timingCacheMode: local
[05/19/2022-09:40:27] [I] timingCacheFile: 
[05/19/2022-09:40:27] [I] Input(s)s format: fp32:CHW
[05/19/2022-09:40:27] [I] Output(s)s format: fp32:CHW
[05/19/2022-09:40:27] [I] Input build shapes: model
[05/19/2022-09:40:27] [I] Input calibration shapes: model
[05/19/2022-09:40:27] [I] === System Options ===
[05/19/2022-09:40:27] [I] Device: 0
[05/19/2022-09:40:27] [I] DLACore: 
[05/19/2022-09:40:27] [I] Plugins:
[05/19/2022-09:40:27] [I] === Inference Options ===
[05/19/2022-09:40:27] [I] Batch: 1
[05/19/2022-09:40:27] [I] Input inference shapes: model
[05/19/2022-09:40:27] [I] Iterations: 10
[05/19/2022-09:40:27] [I] Duration: 3s (+ 200ms warm up)
[05/19/2022-09:40:27] [I] Sleep time: 0ms
[05/19/2022-09:40:27] [I] Idle time: 0ms
[05/19/2022-09:40:27] [I] Streams: 1
[05/19/2022-09:40:27] [I] ExposeDMA: Disabled
[05/19/2022-09:40:27] [I] Data transfers: Enabled
[05/19/2022-09:40:27] [I] Spin-wait: Disabled
[05/19/2022-09:40:27] [I] Multithreading: Disabled
[05/19/2022-09:40:27] [I] CUDA Graph: Disabled
[05/19/2022-09:40:27] [I] Separate profiling: Disabled
[05/19/2022-09:40:27] [I] Time Deserialize: Disabled
[05/19/2022-09:40:27] [I] Time Refit: Disabled
[05/19/2022-09:40:27] [I] Skip inference: Disabled
[05/19/2022-09:40:27] [I] Inputs:
[05/19/2022-09:40:27] [I] === Reporting Options ===
[05/19/2022-09:40:27] [I] Verbose: Disabled
[05/19/2022-09:40:27] [I] Averages: 10 inferences
[05/19/2022-09:40:27] [I] Percentile: 99
[05/19/2022-09:40:27] [I] Dump refittable layers:Disabled
[05/19/2022-09:40:27] [I] Dump output: Disabled
[05/19/2022-09:40:27] [I] Profile: Disabled
[05/19/2022-09:40:27] [I] Export timing to JSON file: 
[05/19/2022-09:40:27] [I] Export output to JSON file: 
[05/19/2022-09:40:27] [I] Export profile to JSON file: 
[05/19/2022-09:40:27] [I] 
[05/19/2022-09:40:27] [I] === Device Information ===
[05/19/2022-09:40:27] [I] Selected Device: NVIDIA GeForce GTX 1050 Ti
[05/19/2022-09:40:27] [I] Compute Capability: 6.1
[05/19/2022-09:40:27] [I] SMs: 6
[05/19/2022-09:40:27] [I] Compute Clock Rate: 1.4175 GHz
[05/19/2022-09:40:27] [I] Device Global Memory: 4040 MiB
[05/19/2022-09:40:27] [I] Shared Memory per SM: 96 KiB
[05/19/2022-09:40:27] [I] Memory Bus Width: 128 bits (ECC disabled)
[05/19/2022-09:40:27] [I] Memory Clock Rate: 3.504 GHz
[05/19/2022-09:40:27] [I] 
[05/19/2022-09:40:27] [I] TensorRT version: 8.2.1
[05/19/2022-09:40:27] [I] [TRT] [MemUsageChange] Init CUDA: CPU +158, GPU +0, now: CPU 169, GPU 116 (MiB)
[05/19/2022-09:40:27] [I] [TRT] Loaded engine size: 7 MiB
[05/19/2022-09:40:27] [E] Error[6]: The engine plan file is generated on an incompatible device, expecting compute 6.1 got compute 7.5, please rebuild.
[05/19/2022-09:40:27] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::50] Error Code 4: Internal Error (Engine deserialization failed.)
[05/19/2022-09:40:27] [E] Failed to create engine from model.
[05/19/2022-09:40:27] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8201] # /home/mle/TensorRT-8.2.1.8/bin/trtexec --loadEngine=mobilenetv2.engine

Environment

TensorRT Version: 8.2.1.8
GPU Type: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
Nvidia Driver Version: 495.29.05
CUDA Version: 10.2
CUDNN Version: 8.2
Operating System + Version: Ubuntu 20.10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

The model can be found here models/mobilenetv2-7.onnx at main · onnx/models · GitHub

Hi,

The generated engine files are not portable across platforms or TensorRT versions. Plans are specific to the exact GPU model they were built on (in addition to the platforms and the TensorRT version) and must be rebuilt on the specific GPU in case you want to run them on a different GPU.
Please refer to the below link for the same.

Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.