[Feature Request] Retrieve the actual precision chosen for each layer after engine is built via IPro...

  • Currently, TensorRT has provided APIs that allow user to “setPrecision” and “setOutputType” using ILayer via INetworkDefintion. But these configuration is performed BEFORE the engine is built.

  • But since there are kernel auto-tuning and layer fusion during the “buildCudaEngine()” process, the actual engine generated in the end will have less layer, and most importantly, if strict mode is disabled, the precision for each layer is automatically selected for best performance, not guarantee to be the precision set by user. So there is no explicit way for us to directly query the layer’s precision after the engine is built. There is no API in ICudaEngine / IProfiler that can retrieve such info. And all I can do is to experiment and guess (crawling through INFO log to get hints) … This is not convenient for optimization.

  • Recently, when I’m trying to convert a model to INT8 mode, I found the INT8 model is running even slower than FP16 mode. And it toke me quite a while to figure out that it is because part of the INT8 model is running in FP32 precision to avoid the expensive reformat layer. If I can directly query the precision of all layers from the engine, things would be a lot easier.

  • So I wonder if TensorRT can add a new feature to allow user to know the actual precision each layer is running, preferrablly via IProfiler, so that we can collect both time and precision of each layer during profiling and etc.

Thanks.

I also use mixed-precision , but I don’t know how to get layer precise of each layer.
I am confused that the score of int8 model is slightly different from fp32 model for some images, although the pr curve is very closed.