Note: this question applies to any ONNX model so I’m not providing one here.
When building a network, the profiling step can take a lot of time to benchmark different tactics and algorithms on the machine’s hardware configuration. This makes deploying a model to a client’s machine not ideal since our users may have to wait a variable amount of time. Thus:
In general, are there any tips & tricks for speeding up this process?
Is there a reliable way of measuring the progress of the profiling process? One possible route is to inspect the verbose logs and count % of layers completed, but that may not be an accurate measurement.
Per the section on compatibility of serialized engines, TensorRT has a way to check if a pre-built model may be compatible with the current hardware. Are these checks available somewhere as a library/API call? We’d like to consider hosting several versions of the model optimized for different setups and use these checks to load the one most likely to be compatible.
Environment
TensorRT Version: 8.2.2.1 GPU Type: Variable Nvidia Driver Version: Variable CUDA Version: 11.6 CUDNN Version: 8 Operating System + Version: Ubuntu 20.04, Windows 10 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Hi, the question is applicable to all models and so I’m looking for solutions/practices that can be applied generally. The profiling performance I’m referring to can be reproduced with trtexec command. If a specific model is still needed, one can e.g. use any of these ResNet ones.
Also, to clarify the question further, it’s not about the inference performance but rather about the time it takes TensorRT to profile a network during the building phase.