Performance-Efficient Mamba-Chat from NVIDIA AI Foundation Models

Originally published at: https://developer.nvidia.com/blog/performance-efficient-mamba-chat-from-nvidia-ai-foundation-models/

This week’s release features the NVIDIA-optimized Mamba-Chat model, which you can experience directly from your browser. This post is part of Model Mondays, a program focused on enabling easy access to state-of-the-art community and NVIDIA-built models. These models are optimized by NVIDIA using TensorRT-LLM and offered as .nemo files for easy customization and deployment. NVIDIA…