How to deploy Nvidia 3.3 70B FP8 model?

bart.michiels1 February 20, 2025, 7:07pm 1

Hi, I have the FP16 original version working in the “run everywhere” container. I quantized the model to FP8 and would now like to include this one to a NIM container so I can make everything run like I would run the original FP16. Could you help me out with setting this up, please?

Topic		Replies	Views
NIM llama3 deploy fp16 Models nim , llama3-70b-instruct	2	135	July 26, 2024
Container image (nim) construction guide for models where a nim doesn't exist? Models nim	2	126	November 29, 2024
How to Create a Custom NIM-Compliant Container Image for Self Hosting? Models nim	3	91	February 26, 2025
Reusing a stored model (llama-3.1-8b-instruct) with a proper profile Models nim , llama-31-8b-instruct , llama	0	146	October 30, 2024
Is it currently possible to deploy our own models on NVIDIA's cloud and use NIM for inference? Models nim	2	174	July 24, 2024
Is it possible to run nim in offline Models nim	2	254	August 21, 2024
NIM Container Model Cache Path Models	2	217	August 28, 2024
NIM Llama 3.3 70B requirements Models hw , nim , llama	2	299	March 21, 2025
NIM to Triton Server Pipeline Models inference-server-triton , nim	0	50	February 27, 2025
NIM for finetunning/custom models? NGC GPU Cloud	1	1245	June 5, 2024

How to deploy Nvidia 3.3 70B FP8 model?

Related topics