NVIDIA provides both demoBERT over TensorRT and BERT over Faster Transformer(FT)/Effective FT (EFT).
Its observed that demoBERT is fastest for batch size 1,
while Faster Transformer becomes a better option for larger batch sizes.
What is NVIDIA’s guidance on when to use demoBERT vs FT/EFT ?
(For BERT base and equivalent models)
If demoBERT is fastest solution, why we should refer to FT/EFT ?