NVIDIA NeMo Accelerates LLM Innovation with Hybrid State Space Model Support

Originally published at: https://developer.nvidia.com/blog/nvidia-nemo-accelerates-llm-innovation-with-hybrid-state-space-model-support/

Today’s large language models (LLMs) are based on the transformer model architecture introduced in 2017. Since then, rapid advances in AI compute performance have enabled the creation of even larger transformer-based LLMs, dramatically improving their capabilities. Advanced transformer-based LLMs are enabling many exciting applications such as intelligent chatbots, computer code generation, and even chip design.…

1 Like

This is a very interesting architecture. I thought we would see more SSM models proposed, but this seems to be a very interesting alternative. Regarding entexted context length, we know how the Relative effective context length (RECL) scales with this architecture? (see for example, TransformerXL).