Deploying a 1.3B GPT-3 Model with NVIDIA NeMo Megatron

jwitsoe · November 4, 2022, 9:35pm

Originally published at: https://developer.nvidia.com/blog/deploying-a-1-3b-gpt-3-model-with-nvidia-nemo-megatron/

Large language models (LLMs) are some of the most advanced deep learning algorithms that are capable of understanding written language. Many modern LLMs are built using the transformer network introduced by Google in 2017 in the Attention Is All You Need research paper. NVIDIA NeMo Megatron is an end-to-end GPU-accelerated framework for training and deploying…

roclark · November 5, 2022, 5:38pm

I loved deploying NeMo Megatron locally to power language-based applications and look forward to seeing exciting new ways to use LLMs. Let me know if there are any questions and I will be happy to help!

pernox.perpetuus · November 11, 2022, 5:13am

Hey that was a great article! It worked well (20b param) but I’m having no luck in changing the temperature. Tried changing it where it was defined and no luck, tried adding it to the argparser, and still no luck… what am I missing?!? and still no luck… what am I missing?!? and still no luck… what am I missing?!? and still no

jkjk thank you for your patience and maybe your guidance :D

dvanstee · March 31, 2023, 2:21pm

Thanks for the excellent tutorial! Everything worked well. I did have one question about the tokenizer. Why is the tokenizer GPT2 even though the model is GPT3 ?

Topic		Replies	Views
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available Technical Blog	8	1890	January 25, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	4081	August 28, 2024
Deploying GPT-J and T5 with FasterTransformer and Triton Inference Server Technical Blog	7	1087	April 19, 2023
NVIDIA AI 파운데이션 모델: 프로덕션-레디 LLM으로 맞춤형 엔터프라이즈 챗봇 및 코파일럿 구축 Technical Blog - South Korea	0	530	November 17, 2023
NVIDIA TensorRT-LLM 및 NVIDIA Triton Inference Server로 Meta Llama 3 성능 강화 Technical Blog - South Korea	1	320	May 3, 2024
NVIDIA AI Foundation Models: Build Custom Enterprise Chatbots and Co-Pilots with Production-Ready LLMs Technical Blog	4	662	April 12, 2024
MT-NLG - Are we ever getting access to the 530 B parameters trained model? TensorRT	3	664	July 7, 2022
Supercharging Llama 3.1 across NVIDIA Platforms Technical Blog	14	300	September 17, 2024
Post-Training Quantization of LLMs with NVIDIA NeMo and NVIDIA TensorRT Model Optimizer Technical Blog	1	60	September 10, 2024
TensorRT Inference Server - AWS S3 Model repository Triton Inference Server (archived)	0	615	May 23, 2019

Deploying a 1.3B GPT-3 Model with NVIDIA NeMo Megatron

Related topics