Int4 Precision for AI Inference

Originally published at: Int4 Precision for AI Inference | NVIDIA Technical Blog

INT4 Precision Can Bring an Additional 59% Speedup Compared to INT8 If there’s one constant in AI and deep learning, it’s never-ending optimization to wring every possible bit of performance out of a given platform. Many inference applications benefit from reduced precision, whether it’s mixed precision for recurrent neural networks (RNNs) or INT8 for convolutional…

Has this been made available in production in some way? It would be great if I could use INT4 with TensorRT.

Also doubted that Tesla T4 announced to supporting int4 inference, but I didn’t find any usages or document about int4 in TensorRT doc or anywhere else… Could somebody help?