Inference Performance of LLAMA-2 posted by Nvidia
According to the link above, the inference lantecy of LLAMA-2-13B with A100 80GB SXM4 at batch size=1 and tp=1, is less than latency of LLAMA-2-7B under the same condition.
How can we get this performance data? It’s unbelievable and ridiculous.