Hi,
Is there API for fetching the INT8 calibration scale factors for a given tensor after performing the INT8 calibration process?
Currently this information is encoded into the INT8 calibration cache, but the existing API only gives a raw pointer to a buffer. According to the docs, the calibration cache is an “internal implementation detail”, so I take it I should not “reverse engineer it” to obtain these scale factors, since this can change “any time” at Nvidia’s discretion.
My use case is:
- Build DLA engine + Safety + INT8.
- Due to Safety, the network must be reformat-free.
- Therefore, the I/O tensors are INT8.
- I want to end up with FP32 tensors → I need to create a reformatting layer INT8 → Fp32 myself.
- I can only accomplish that if I know the correct scale factors to apply to convert from INT8 to FP32. This information is obtained somewhere in the INT8 calibration process - how do I get it?
Thanks!