Hi, I have the FP16 original version working in the “run everywhere” container. I quantized the model to FP8 and would now like to include this one to a NIM container so I can make everything run like I would run the original FP16. Could you help me out with setting this up, please?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
NIM llama3 deploy fp16 | 2 | 129 | July 26, 2024 | |
Container image (nim) construction guide for models where a nim doesn't exist? | 2 | 106 | November 29, 2024 | |
How to Create a Custom NIM-Compliant Container Image for Self Hosting? | 3 | 66 | February 26, 2025 | |
Reusing a stored model (llama-3.1-8b-instruct) with a proper profile | 0 | 129 | October 30, 2024 | |
Is it currently possible to deploy our own models on NVIDIA's cloud and use NIM for inference? | 2 | 163 | July 24, 2024 | |
Is it possible to run nim in offline | 2 | 221 | August 21, 2024 | |
NIM Container Model Cache Path | 2 | 188 | August 28, 2024 | |
NIM Llama 3.3 70B requirements | 2 | 168 | March 21, 2025 | |
NIM to Triton Server Pipeline | 0 | 32 | February 27, 2025 | |
NIM for finetunning/custom models? | 1 | 1199 | June 5, 2024 |