Performance data mistakes in LLAMA inference

Inference Performance of LLAMA-2 posted by Nvidia

According to the link above, the inference lantecy of LLAMA-2-13B with A100 80GB SXM4 at batch size=1 and tp=1, is less than latency of LLAMA-2-7B under the same condition.

How can we get this performance data? It’s unbelievable and ridiculous.

You may get a better response by posting your question in the Nemo discussions area