GTC 2020: PyTorch-TensorRT: Accelerating Inference in PyTorch with TensorRT

GTC 2020 S21671
Presenters: Josh Park,NVIDIA; Naren Dasan,NVIDIA
TensorRT is a deep-learning inference optimizer and runtime to optimize networks for GPUs and the NVIDIA Deep Learning Accelerator (DLA). Typically, the procedure to optimize models with TensorRT is to first convert a trained model to an intermediary format, such as ONNX, and then parse the file with a TensorRT parser. This works well for networks using common architectures and common operators; however, with the rapid pace of model development, sometimes a DL framework like Tensorflow has ops that are not supported in TensorRT. One solution is to implement plugins for these ops. Another is to use a tool like TF-TRT, which will convert supportable subgraphs to TensorRT and use Tensorflow implementations for the rest. We’ll demonstrate the same ability with PyTorch with our new tool PTH-TRT, as well leveraging the PyTorch API’s great composability features to allow users to reuse their TensorRT-compatible networks in larger, more complex ones.

Watch this session
Join in the conversation below.

1 Like