Easier. Faster. Open. TensorRT LLM 1.0 is here

Simple deployment, open source, and extensible – all while pushing the frontier of inference performance.

With record-setting 8× inference speedups, TensorRT LLM v1.0 makes it simple to deliver real-time, cost-efficient LLMs on NVIDIA GPUs.

GitHub release:

What’s New in v1.0

PyTorch model authorship for rapid development

Modular Python runtime for flexibility

Stable LLM API for seamless deployment

Livestream: Learn More
Date: Sept 25
Time: 5–6 PM (PDT)
Link: TensorRT LLM Livestream: New Easy-To-Use Pythonic Runtime | AddEvent

1 Like