GTC 2020: Optimizing TensorRt Conversion for Real-Time Inference On Autonomous Vehicles

GTC 2020 S22198
Presenters: Dheeraj Peri,NVIDIA; Josh Park,NVIDIA; Zejia Zheng,Zoox; Jeff Pyke,Zoox
Abstract
TensorRt optimizes neural-network computation for deployment on GPU, but not all operations are supported. Reduced precision inference speeds up computation, but can cause regressions in accuracy. We’ll introduce Zoox TensorRt conversion pipeline that addresses these problems. TensorRt compatibility checks are involved at the early stages of neural-network training to ensure that incompatible ops are discovered before wasting time and resources on full-scale training. Inference accuracy checks can be invoked at each layer to identify operations not friendly to reduced-precision computation. Detailed profiling reveals unnecessary computations that aren’t optimized inside TensorRt, but can be optimized by simple code changes during graph construction. With this pipeline, we’ve successfully provided TensorRt conversion support to neural networks performing various perception tasks on the Zoox autonomous driving platform.

Watch this session
Join in the conversation below.