I have a small question. If we use pytouch with cuda (mixed precision), what is the purpose of tensorRT?
TensorRT provides INT8 using quantization-aware training and post-training quantization and FP16 optimizations.
Pytouch use TensorRT for mixed precision? or both are separate programs developed by different companies use for mixed precision? or TesnorRT uses different things or different methods?