Originally published at: NVIDIA TensorRT for RTX Introduces an Optimized Inference AI Library on Windows 11 | NVIDIA Technical Blog
AI experiences are rapidly expanding on Windows in creativity, gaming, and productivity apps. There are various frameworks available to accelerate AI inference in these apps locally on a desktop, laptop, or workstation. Developers need to navigate a broad ecosystem. They must choose between hardware-specific libraries for maximum performance, or cross-vendor frameworks like DirectML, which simplify…
Sounds pretty good. Now I wonder, though: For me, the final inference speed is the most important thing of all. So how will inference speed compare between:
- Using TensorRT for RTX.
- Using conventional TensorRT, and building a custom engine for each supported GPU model.
Will 1) be just as fast as 2)? Or will 2) have performance benefits over 1)?