TensorRT generating many small kernels from same ONNX on A6000 vs RTX5000

thromadka · August 15, 2023, 11:40am

Description

When deploying from open source ONNX of YOLOv4 using COCO on the RTX A6000 (Ampere), seeing many tiny kernels for TRT in nSight and low SMP utilization compared to what we see on the Quadro RTX 5000. Theory is that the TensorRT optimizer is failing to do appropriate kernel fusion.

Environment

GPU Type: RTX A6000 (Ampere) vs Quadro RTX 5000 (Turing)

spolisetty · August 29, 2023, 11:26am

The architectural differences between the Ampere (RTX A6000) and Turing (Quadro RTX 5000) GPUs can affect how TensorRT optimizes models, resulting in variations in kernel generation.

Make sure you are running the most recent version of TensorRT. Newer versions frequently include kernel generation optimizations and improvements.

Please refer to the TensorRT developer guide for more information.

Topic		Replies	Views
Onnx -> TensorRT. No speed difference between models TensorRT	1	558	June 24, 2021
Tesla K80 + RTX 4090 combined to train models and RAG TensorRT tensorrt , hw , cuda , tensorflow , cudnn	2	532	August 3, 2024
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	3041	January 18, 2019
TensorRT 4.0.1 for amd64 Vs. TensorRT 4.0.2 for TX2 TensorRT	1	852	January 1, 2019
TensorRT model accuracy on different GPUs TensorRT	3	2013	October 3, 2018
RTX3090 vs A100 TensorRT	2	1142	April 4, 2023
TensorRT poor inference performance on Ampere TensorRT	1	502	February 25, 2021
P6000 TensorRT too slow and the serialized fp16-model size is not as expected TensorRT tensorrt	1	527	April 4, 2023
GPU Performance Comparison: A5000 to A6000 TensorRT	2	1712	June 17, 2022
A10 GPU using more GPU RAM than T4 GPU for inference using PyTorch TensorRT model CUDA NVCC Compiler tensorrt , camera , cuda , ubuntu	0	1196	December 2, 2021

TensorRT generating many small kernels from same ONNX on A6000 vs RTX5000

Description

Environment

Related topics