Applying Mixture of Experts in LLM Architectures

Originally published at: https://developer.nvidia.com/blog/applying-mixture-of-experts-in-llm-architectures/

Mixture of experts (MoE) large language model (LLM) architectures have recently emerged, both in proprietary LLMs such as GPT-4, as well as in community models with the open-source release of Mistral Mixtral 8x7B. The strong relative performance of the Mixtral model has raised much interest and numerous questions about MoE and its use in LLM…