Different between TensorRT and Pytouch Cuda use (mixed precision)

I have a small question. If we use pytouch with cuda (mixed precision), what is the purpose of tensorRT?

TensorRT provides INT8 using quantization-aware training and post-training quantization and FP16 optimizations.

Pytouch use TensorRT for mixed precision? or both are separate programs developed by different companies use for mixed precision? or TesnorRT uses different things or different methods?

Hi, Please refer to the below links to perform inference in INT8


Thank you very much. I will refer to it. If I have a problem, I will post.