New Mistral NeMo 12B - Advanced Language Model that Runs on a Single GPU

NVIDIA and Mistral co-developed Mistral NeMo 12B, a state-of-the-art model that excels across various benchmarks, including common sense reasoning, world knowledge, coding, math, and multilingual conversations. See benchmark details here.

  • 128K Context Window: - Dense transformer model with a 128K context length for enhanced understanding and processing of complex information.

  • Training Data: Trained on Mistral’s proprietary dataset, featuring a large proportion of multilingual and code data.

  • Training Optimizations: Utilizes NVIDIA Megatron-LM, part of NVIDIA NeMo for efficient large-scale training on NVIDIA DGX Cloud.

  • Inference Optimizations: Enhanced with NVIDIA TensorRT-LLM engines for higher performance, including optimizations like in-flight batching, KV caching, and FP8 support.

Deployment:

  • NVIDIA NIM: Packaged as an NVIDIA NIM inference microservice, enabling streamlined deployment across platforms with high-throughput inference.

  • Use cases: Ideal for tasks such as document summarization, classification, multi-turn conversations, language translation, and code generation.

  • Open Licensing: Available under Apache 2.0 license, allowing customization and integration into commercial applications.

Getting Started:

Experience Mistral NeMo NIM by visiting ai.nvidia.com and utilize free NVIDIA cloud credits to test and build proofs of concept.