When to use demoBERT implemented over TensorRT vs BERT using Faster Transformers?


NVIDIA provides both demoBERT over TensorRT and BERT over Faster Transformer(FT)/Effective FT (EFT).

Its observed that demoBERT is fastest for batch size 1,
while Faster Transformer becomes a better option for larger batch sizes.

What is NVIDIA’s guidance on when to use demoBERT vs FT/EFT ?
(For BERT base and equivalent models)

If demoBERT is fastest solution, why we should refer to FT/EFT ?


Please refer the following doc.

Thank you.

