Power Text-Generation Applications with Mistral NeMo 12B Running on a Single GPU

Originally published at: https://developer.nvidia.com/blog/power-text-generation-applications-with-mistral-nemo-12b-running-on-a-single-gpu/

NVIDIA collaborated with Mistral to co-build the next-generation language model that achieves leading performance across benchmarks in its class. With a growing number of language models purpose-built for select tasks, NVIDIA Research and Mistral AI combined forces to offer a versatile, open language model that’s performant and runs on a single GPU. This post explores…

‘Single GPU’. You mean a 6GB, 12GB, 24GB, 40GB, 80GB VRAM?. These basic things are important. Check out llama 3.1 Model Card for how they comprehensively broke down model description and requirements.

1 Like

Hi @babatundeolanipekun - this model being just 12B parameters and close to 24GB size will fit A100/H100/H200s easily. We’ll update the blog to include this specific information, thanks for your feedback.

1 Like

Thank you very much for the exciting article. Are the HumanEval benchmark values mentioned in the article for the Base or the Instruct model? Is there, by any chance, an overview with various benchmarks (HumanEval, MBPP) in which both models are listed? Were the values measured by yourselves or are they from other Research-Paper? Google, for example, reports values of 52.4 and 59.2 for its models Base and Instruction-Model on MBPP.