Our customers have different devices like 1080TI, 2080TI, 3080, Jetson AGX… and I want to know how to shortened the build engine time.
I’m upgrading my TRT version from 7 to 8 and I found it have a new feature call timing cache. Does this feature means I can use this cache across different devices like gen cache on 2080TI and use it on 3080? If not, what is the best method to shortened the build engine time?
Environment
TensorRT Version: 8.4.1.5 GPU Type: 1080TI, 2080TI, 3080, Jetson AGX Nvidia Driver Version: 516.40 CUDA Version: 11.1 CUDNN Version: 8 Operating System + Version: windows 10 + ubuntu 20.04 + Jetson Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Hi,
My model is support for all device like 1080, 2080,3080. I remember that it could not share the same plan file for different compute capability, right?
We give customer our onnx models and they gen the plan file by their devices. Now, they think it takes a long time to generate the plan file and I found maybe timing cache can solve this question, is it?
If not, do you have any idea to solve this problem?
Hi @spolisetty ,
If I want to shortend build engine time for fp16, just turn on timing cache and workspace, is it?
If so, how big would you recommend increasing the workspace?