Hi,
For GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT
there are multiple options to build.
Query 1: Will choosing C++ or Pytorch affect the latencies observed in anyway?
Query 2: What might be the case where C++ might be preferred to Pytorch or vice versa ?