Simple deployment, open source, and extensible – all while pushing the frontier of inference performance.
With record-setting 8× inference speedups, TensorRT LLM v1.0 makes it simple to deliver real-time, cost-efficient LLMs on NVIDIA GPUs.
GitHub release:
What’s New in v1.0
PyTorch model authorship for rapid development
Modular Python runtime for flexibility
Stable LLM API for seamless deployment
Livestream: Learn More
Date: Sept 25
Time: 5–6 PM (PDT)
Link: TensorRT LLM Livestream: New Easy-To-Use Pythonic Runtime | AddEvent