Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version: 8.4.2
GPU Type: A100
Nvidia Driver Version: 465.19.01
CUDA Version: 11.3
CUDNN Version: 8
Operating System + Version: SLES “15-SP2” in host machine
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 1.13.0a0+d321be6
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/pytorch:22.08-py3
Hi. I convert pytorch retinaface and arcface model to TensorRT via torch_tensorrt library. Everything is okay but after some iterations inference is freezing and the time for handling the image is badly increased (>10x).
Snippet of inference simulation is here:
import torch
import torch_tensorrt
import time
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
retinaface_model = torch.jit.load('../jit_retinaface_trt.torch-tensorrt')
retinaface_model.eval()
retinaface_model.to(DEVICE)
arcface_model = torch.jit.load('../arcface_bs1_torch.float32.torch-tensorrt')
arcface_model.eval()
arcface_model.to(DEVICE)
retinaface_tensor = torch.rand(1, 3, 360, 640).to(DEVICE)
arcface_tensor = torch.rand(1, 3, 112, 112).to(DEVICE)
for _ in range(100):
global_start = time.time()
start_time = time.time()
with torch.no_grad():
ret_out = retinaface_model(retinaface_tensor)
torch.cuda.synchronize()
end_time = time.time()
ret_time = end_time - start_time
start_time = time.time()
with torch.no_grad():
arc_out = arcface_model(arcface_tensor)
torch.cuda.synchronize()
end_time = time.time()
arc_time = end_time - start_time
global_end = time.time()
global_time = global_end - global_start
# if global_time > 0.1:
print(f'ret time is : {ret_time}')
print(f'arc time is : {arc_time}')
print(f'global time is : {global_end-global_start}')
print('-'*40)
Outputs:
Normally output is like this:
ret time is : 0.0009617805480957031
arc time is : 0.0019981861114501953
global time is : 0.002961874008178711
ret time is : 0.0008959770202636719
arc time is : 0.0019989013671875
global time is : 0.002896547317504883
ret time is : 0.0009148120880126953
arc time is : 0.0020008087158203125
global time is : 0.0029172897338867188
ret time is : 0.0008985996246337891
arc time is : 0.001995086669921875
global time is : 0.002894878387451172
ret time is : 0.00446009635925293
arc time is : 0.002003192901611328
global time is : 0.006464719772338867
ret time is : 0.0009562969207763672
arc time is : 0.0020017623901367188
global time is : 0.0029592514038085938
ret time is : 0.0009098052978515625
arc time is : 0.002006053924560547
global time is : 0.002917051315307617
ret time is : 0.0009250640869140625
arc time is : 0.001997709274291992
global time is : 0.002924203872680664
ret time is : 0.0009291172027587891
arc time is : 0.001995086669921875
global time is : 0.002925395965576172
ret time is : 0.0009377002716064453
arc time is : 0.0020194053649902344
global time is : 0.0029582977294921875
ret time is : 0.0009005069732666016
arc time is : 0.0019958019256591797
global time is : 0.0028977394104003906
ret time is : 0.0009152889251708984
arc time is : 0.001996755599975586
global time is : 0.0029134750366210938
ret time is : 0.0009534358978271484
arc time is : 0.0019991397857666016
global time is : 0.0029540061950683594
ret time is : 0.0009467601776123047
arc time is : 0.0020117759704589844
global time is : 0.002960205078125
ret time is : 0.0008974075317382812
arc time is : 0.0019989013671875
global time is : 0.0028977394104003906
ret time is : 0.0009267330169677734
arc time is : 0.002001523971557617
global time is : 0.0029296875
But after some iterations and time return this:
ret time is : 0.0030410289764404297
arc time is : 0.10997724533081055 <-----
global time is : 0.11302065849304199
ret time is : 0.002657651901245117
arc time is : 0.1075441837310791 <-----
global time is : 0.11020350456237793
ret time is : 0.1104578971862793 <-----
arc time is : 0.0020885467529296875
global time is : 0.1125497817993164
ret time is : 0.11419057846069336 <-----
arc time is : 0.0020301342010498047
global time is : 0.11622214317321777
ret time is : 0.10733747482299805 <-----
arc time is : 0.0020294189453125
global time is : 0.10936880111694336
ret time is : 0.1150820255279541 <-----
arc time is : 0.0020606517791748047
global time is : 0.11714410781860352
I try changing the clock freq to the max of A100(1410MHz) but nothing changes from the default(765MHz).
It will be great if you support fixing this bug. Thanks in advance!!!Preformatted text