Request: ability to cross-build PLAN files

carlos.galvez · February 24, 2020, 8:20am

Hi,

Currently in order to build a PLAN file using TensorRT you need to do it in the target platform that you intend to run inference on.

This is a hassle in a number of aspects, especially if you want to be able to test your model in different GPUs.

Would it be possible to add an option to TensorRT to be able to “cross-build” PLAN files? At the expense of suboptimal optimization, of course.

Similar to what NVCC can do already, where you tell it which CC you want to build your CUDA code for. This way you could for example build a PLAN file on your laptop and quickly copy it over to e.g. a DrivePX to run inference on it.

NVES_R · February 24, 2020, 11:58pm

Hi,

Unfortunately, that capability isn’t available at the moment.

However, in your described process of building all engines on one device, copying over the engine file to target devices, and then running inference on each device - wouldn’t it take equal, if not less, time to copy over the conversion script (or use trtexec) + the model file to all devices and run build + inference? In fact, you’d even be able to build multiple engines in parallel this way, whereas building all engines on one device would have to be done sequentially.

For example, if you’re already copying over a file, ssh’ing into each device, and then running inference, can’t you just combine executing your build + inference scripts into a wrapper bash script and then just copy over model+script, and run your wrapper script for build+inference on each device?

It seems like the same amount of work, and you don’t have the “expense of suboptimal optimization” either.

carlos.galvez · February 25, 2020, 11:14am

Hi,

That’s correct, I could make a script that does the building plus the inference. There’s a few drawbacks though:

-We need to deploy temporary calibration caches that are not part of the final application.
-It currently takes around 15 minutes per network to generate an INT8 PLAN file on DDPX (even using the calibration cache). This is not the case when building PLAN files on an x86 computer - I blame the slow ARM cores of the DDPX. Having a few networks to deploy, the build time can get up to a few hours, which is not very practical.

We want to be able to deploy by simply copying over only the necessary built files that compose the application, and then immediately run the application without having to wait for anything else. Building can be done at any other time, perhaps even overnight.

In general I think it would be good to give an option to the user to “not run optimization”, and use “a reasonably fast” implementation for each layer. Similar to NVCC, a “PLAN file compiler” could choose reasonably fast implementations for each requested CC.

This would also be beneficial in terms of reproducibility; right now PLAN files are not reproducible, since building them depends on the status of the target it’s built on, which is not deterministic. Being able to choose a “reproducible, but not the fastest possible” implementation is very valuable in my opinion.

To be honest, I actually see some flaws in the optimization process - you optimize based on the “current” status of your target machine. But then you might run it on another target machine (still same architecture) that has a “different” state.

Example: you have a DDPX in a build farm and a DDPX in the car. The DDPX in the build farm has no other process running on it and you build a PLAN file there. Then you copy over that PLAN file to the DDPX on the car, where some other processes are running. Therefore the state is different, and perhaps the optimization that you did in the build farm doesn’t hold anymore when you run it in the car?

mooyix · January 17, 2021, 7:05pm

I realize this is an old topic, but I wanted to second the request for cross-building TensorRT plan files. I have a Jetson Nano 2GB, and the ONNX->TensorRT build step runs of of memory and gets killed. It would be great to be able to build the plan file on a machine with more RAM and faster CPUs and then just transfer it to the target for inference.

Topic		Replies	Views
TensorRT engines are built so differently with the same IBuilderConfig, how to fix? TensorRT	1	616	September 20, 2021
Do tensorRT plan files are portable across different GPUs which have the same type TensorRT	2	6421	August 13, 2020
Generating Plan file for Jetson Nano developer kit 4GB Jetson Nano tensorrt	8	689	October 15, 2021
How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches? TensorRT tensorrt , cuda , ubuntu , python , cudnn , deep-learning	2	86	December 2, 2024
Cross GPU optimization TensorRT	4	808	October 12, 2021
TensorRT model build time and deployment TensorRT tensorrt , cuda , computer-vision-cv	3	2985	February 2, 2022
[Feature request] Make using incompatible timing caches for building CUDA engines not a hard error TensorRT	6	578	August 4, 2022
Engine plan file incompatible? DriveWorks	13	2769	October 12, 2021
Build TensorRT on Cuda compute capability 7.5 and make it backward compatible with previous capabilities TensorRT tensorrt	4	1916	May 19, 2022
TensorRT Engine Creation Methods’ Differences TensorRT tensorrt	1	423	September 27, 2023

Request: ability to cross-build PLAN files

Related topics