I upgraded from TensorRT 7.2.3.4 to TensorRT 8.0.1.6.
In my code configuration, after the initialize(first inference here), inference is performed twice in a row.
When inferring twice in a row, the speed of the first inference is very slow.
The speed difference seems to be more than doubled.
In TensorRT 7, both had the same speed.
I have the same symptoms on both Jetson nano and PC.
Is there a difference between 7 and 8?
Environment
TensorRT Version: 8.0.1.6 GPU Type: Jetson nano, gtx1060 Nvidia Driver Version: CUDA Version: 10.2 CUDNN Version: 8.1(PC), 8.2(Jetson nano) Operating System + Version: Ubuntu 18.04 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
for(int i = 0; i < 2; ++i){
/*
do preprocess
*/
inference(data);
/*
do postprocess
*/
}
After about 15 seconds or more after the for loop, when I run the for loop again, the first cudaMemcpyHostToDevice is very slow.
But if I run the for loop again in about 15 seconds, it’s very fast.
I believe this is an expected behaviour, The very first run usually takes a long time in setting up stuff.
If you find this as an issue, could you please collect the Nsight Systems profile so that we have a better look.
Also please share minimal issue repro to try from our end.