I am trying to benchmark BERT on TensorRT (Using demoBERT implementation by NVIDIA).
My data distribution consists of text of varying lengths with more than 90% having just a single word.
For this I am trying to use dynamic shapes using an optimization profile which favours short sentences but is able to take care of larger ones too. (e.g. min (batchsize,1), opt (batchsize,4) , max(batchsize, 128)
I observe that using engine with dynamic shapes inference is slower than using engine with fixed shape for few scenarios. Is this expected?