Hi @carlosgalvezp,
Request you to share the system details in the below format along with the model file.
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version
Also please note that UFF parser has been deprecated from TRT 7 onwards,
Hence we recommend you to use ONNX parser.
I know that UFF has been deprecated, but that is irrelevant to this question. Both UFF and ONNX produce a INetworkDefinition when parsed. After that, profiling happens, which is the topic of my question.
However, why is profiling needed for the FP32 engine? That engine doesn’t need to be the fastest one, it just has to be “an” engine that can run on FP32. It should be irrelevant which tactic is chosen, to compute the histogram we only care about inputs/outputs to the layer, not its internal implementation, right?
@carlosgalvezp
In order to represent 32-bit floating point values and INT 8-bit quantized values, TensorRT needs to understand the dynamic range of each activation tensor. The dynamic range is used to determine the appropriate quantization scale. FP32 engine gives that baseline.
I understand that, but that’s not my question. An FP32 model needs to be built, absolutely.
My question is: why does the fastest FP32 model need to be built? Why is profiling needed to build the FP32 model? The dynamic range of activations is not dependent on how fast the layers run. It’s only dependent on the input/outputs of each layer, therefore profiling should not be needed.
In other words, if I have e.g. a Conv2D layer, I don’t care which one of the N implementations is chosen. In the end of the day it’s still a Conv2D, very well defined mathematically, so given a set of inputs it should produce the same set of outputs, and from those we can determine the dynamic range.
Thanks! I was hoping to get some answers from the TRT developers but I guess this is as far as it goes :)
My main issue is build time, currently it takes way too long to build the plan files, so I’m looking for ways to optimize that. Caching introduces correctness problems.