The TensorRT log shows my layers get assigned “Half” precision and not “Int8” - I would like to know why.
Most operations in the model are squeeze / unsqueeze / concat, etc’ but it also has Conv2D / Relu. Still those operations must perform faster when Input / Output is quantized to 8 bits.
I have an model, ONNX was provided this thread.
I expect that when I ask trtexec to generate an engine file with “–best” settings, I will get some Int8 layers. But for my model, trtexec produces layers with Half precision.
My question is why doesn’t my model get Int8 layers? Could you please check my model?
I couldn’t understand from the log file why it rejects setting Int8 layers.
Even if I provide a calibration file for the model, it still does not generate Int8 layers.
Sorry for the delayed response.
When we give –-best, TensorRT will choose the better one among 3 precisions based on the runtime, network.
Also, The network consists of many convs with tiny input channels (C) and output channels (K). Something like 1 or 5.
INT8 TensorCores require us to pad C and K to 32, so essentially it will waste a lot of computations and does not have benefits compared to FP16.
Also, we are not sure why the model requires so many Split ops and tiny Conv ops. If we want to use grouped conv or depthwise conv, we should use a single Conv op with the group attribute set to the number of groups, instead of splitting each group into standalone Conv ops.