How to use DLA + INT8 + I/O reformatting?

Hi,

We have a model with FP32 inputs and outputs, and want to run it on DLA in INT8 mode.

If we ran it on GPU, TensorRT would insert reformating layers FP32 → INT8 at the input, and INT8 → FP32 at the output. However, this is not done on DLA, and the client is responsible for doing the appropriate conversion.

How is this done in practice? The conversion FP32 → INT8 requires appropriate scale factors. Where are they obtained from? Should they be obtained from the Calibration Table that comes out of the engine build process? If not, is there a “standard practice” on how to do it?

Thanks!

Hi, Please refer to the below links to perform inference in INT8

Thanks!