I am a PyTorch user, recently I am searching way to efficiently inference Resnet50 model on nvidia GPU (I have 1050 Ti 1080 Ti and potentially 2080 Ti) as I know modern nvidia GPUs should perform int8 operations 4x faster compared to fp32 analog, so I am trying to enable it for Resnet50.
Currently I overloaded with information, could somebody please advice me some proper way to make inference of my Resnet50 in Int8 precision with significant performance gain?
PS
I tried GLOW-OpenCL converted ONNX model from Pytorch with no significant performance gain after quantization (will try it again maybe). I heared about TensorRT framework that it could use of ONNX model can it help me?