pytorch JIT also claims to optimize CUDA kernels by batching smaller ones into larger ones. Has anybody done comparison between the increase of throughput from pytorch → jit and pytorch → tensorRT?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
why batchsize is larger the per-image inference time is faster for a specific input size? | 1 | 887 | February 25, 2020 | |
JAX vs TensorRT | 2 | 1348 | October 18, 2021 | |
NVIDIA-AI-IOT/torch2trt vs NVIDIA / Torch-TensorRT | 1 | 2506 | May 4, 2022 | |
Different between TensorRT and Pytouch Cuda use (mixed precision) | 2 | 394 | December 1, 2022 | |
Compiling through nvcc versus JIT driver compilation | 0 | 324 | April 19, 2021 | |
Is there a performance difference between TensorRT and onnxruntime with TensorRT integration? | 0 | 432 | May 16, 2019 | |
TensorRT: Python vs C++ | 1 | 1587 | October 10, 2018 | |
Why inference speedup increases with the increase of batch size in tensorrt int8? | 1 | 2184 | December 17, 2018 | |
Ideas to maximize throughput using TensorRT | 1 | 362 | November 20, 2020 | |
Performance Comparison: Multiple CUDA Streams with Multiple TensorRT Models vs. Combining Multiple TensorRT Models | 0 | 381 | December 23, 2023 |