Tensorrt is slower than pytorch

hotpanda · September 14, 2021, 8:47am

pytorch inference is 100x faster than tensorrt.

# my model code
dblock = nn.Sequential(
    nn.Conv1d(dim_embedding, dim_embedding, 8, 4, 2, bias=False),
    nn.GroupNorm(32, dim_embedding),
    nn.ReLU(inplace=True),
    nn.Conv1d(dim_embedding, dim_embedding, 8, 4, 2, bias=False),
    nn.GroupNorm(32, dim_embedding),
    nn.ReLU(inplace=True),
    nn.Conv1d(dim_embedding, dim_embedding, 8, 4, 2, bias=False),
    nn.GroupNorm(32, dim_embedding),
    nn.ReLU(inplace=True),
)

I use the volksdep to load the engine.
below is the information after I ran ‘trtexec --loadEngine=my engine file’.

[09/14/2021-16:26:29] [I] === Performance summary ===
[09/14/2021-16:26:29] [I] Throughput: 2409.12 qps
[09/14/2021-16:26:29] [I] Latency: min = 0.373535 ms, max = 1.46655 ms, mean = 0.383044 ms, median = 0.379639 ms, percentile(99%) = 0.421509 ms
[09/14/2021-16:26:29] [I] End-to-End Host Latency: min = 0.381104 ms, max = 1.4812 ms, mean = 0.395188 ms, median = 0.392273 ms, percentile(99%) = 0.434494 ms
[09/14/2021-16:26:29] [I] Enqueue Time: min = 0.364685 ms, max = 1.46265 ms, mean = 0.37836 ms, median = 0.375671 ms, percentile(99%) = 0.416306 ms
[09/14/2021-16:26:29] [I] H2D Latency: min = 0.00878906 ms, max = 0.0286865 ms, mean = 0.0103995 ms, median = 0.010376 ms, percentile(99%) = 0.0112305 ms
[09/14/2021-16:26:29] [I] GPU Compute Time: min = 0.357361 ms, max = 1.45093 ms, mean = 0.366774 ms, median = 0.363525 ms, percentile(99%) = 0.40448 ms
[09/14/2021-16:26:29] [I] D2H Latency: min = 0.00488281 ms, max = 0.041687 ms, mean = 0.00587121 ms, median = 0.00592041 ms, percentile(99%) = 0.0065918 ms
[09/14/2021-16:26:29] [I] Total Host Walltime: 3.00067 s
[09/14/2021-16:26:29] [I] Total GPU Compute Time: 2.65141 s
[09/14/2021-16:26:29] [W] * Throughput may be bound by Enqueue Time rather than GPU Compute and the GPU may be under-utilized.
[09/14/2021-16:26:29] [W] If not already in use, --useCudaGraph (utilize CUDA graphs where possible) may increase the throughput.
[09/14/2021-16:26:29] [I] Explanations of the performance metrics are printed in the verbose logs.

Environment

TensorRT Version: 8.0.1.6
GPU Type: V100-32G
Nvidia Driver Version: 450.119.04
CUDA Version: 10.2
CUDNN Version: 8.0.2
Operating System + Version: ubuntu 16.04
Python Version (if applicable): 3.7.9
PyTorch Version (if applicable): 1.6.0

hotpanda · September 15, 2021, 2:37am

replacing nn.GroupNorm(32, dim_embedding) to nn.BatchNorm1d(dim_embedding), trt inference is faster than pytorch.

spolisetty · September 15, 2021, 7:48am

Hi @hotpanda,

Please refer following doc to check supported layers in TensorRT.

Thank you.

Topic		Replies	Views
TensorRT inference slower than PyTorch, different tactics are being selected TensorRT tensorrt	1	1333	November 27, 2023
TensorRT Inconsistent Inference Performance with Python and Trtexec TensorRT tensorrt , cuda , jetson-inference , python , cudnn	0	315	April 2, 2024
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference.How can i do that TensorRT tensorrt , cuda , jetson-nano	3	755	March 13, 2023
Why is TensorRT faster than TensorFlow? TensorRT	3	1629	April 26, 2022
TensorRT Batching Speed scales poorly TensorRT tensorrt , cuda	6	1719	September 30, 2021
TensorRT inference time extremely slow TensorRT	1	449	January 31, 2023
There is no speed up with trt model compared with pytorch TensorRT tensorrt , pytorch	5	1224	May 12, 2022
Inference Speed Jetson Xavier NX pytorch	6	873	April 12, 2023
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1139	January 19, 2022
Inference time of tensorrt 6.3 is slower than tensorrt 6.0 TensorRT tensorrt , driveos	7	915	October 12, 2021

Tensorrt is slower than pytorch

Environment

Related topics