End-to-End AI for NVIDIA-Based PCs: CUDA and TensorRT Execution Providers in ONNX Runtime

Originally published at: https://developer.nvidia.com/blog/end-to-end-ai-for-nvidia-based-pcs-cuda-and-tensorrt-execution-providers-in-onnx-runtime/

This post is the fourth in a series about optimizing end-to-end AI. The last post described the higher-level idea behind ONNX and ONNX Runtime. As explained in the previous post in the End-to-End AI for NVIDIA-Based PCs series, there are multiple execution providers (EPs) in ONNX Runtime that enable the use of hardware-specific features or optimizations…

1 Like

Thanks for the great blog post. Assuming a previously generated TRT engine, will ONNX with a TensorRT EP achieve the same runtime performance as running the engine directly through TensorRT APIs? In other words, is there any performance penalty to use TensorRT through ONNX runtime?

If your engine is not split up by ONNX Runtime the performance should be the same. Essentially if an ONNX file is not able to compile to a single engine ONNXRuntime will slice up the network and fallback to CUDA Execution provider for unsupported ops.
There are a few things to watch out for:

  1. TensoRRT in ONNX Runtime is not async by default- meaning you will waste valuable CPU time:
  2. How do you provide data to TensorRT ? You want to ensure that PCI traffic and running the execution is overlapped by using cuda streams and cuda events. This is in my opinion a little more natural with pure TRT but is certainly possible and demonstrated with ONNX Runtime and demonstrated here: ProViz-AI-Samples/cuda_sample.cpp at master · NVIDIA/ProViz-AI-Samples · GitHub