Hi. I’m using the python API to test TensorRT performance. Is there a way to know what optimizations were made to the model? For example, what layers were fused, what type of quantization was made (is it using INT8, FP16 and so on), and so on? As far as I know all the optimizations are not always possible, so I’d like to see what was done with different models.
You can set logging to info mode to get more details about optimization.
You can also, use “trtexec” command line tool to understand performance and possibly locate bottlenecks.
Please find the below links for your reference: