Hi there,
Is it possible to deploy the large models with NIM such as llama-3.1-405b-instruct and nemotron-4-340b-reward somewhere or get access to more credits for further experimentation with the NeMo Curator. The credits didnt allow for much testing.
Huggingface doesnt allow you to deploy NIMs for models of this size.
Is the solution here to deploy on a cloud provider with the self-hosted method?
Access to this sort of information for further access to NIMs is not easy to find.
Thanks for your help & time
Hey @isaac.woodruff – yeah, deploying on a cloud provider with the self-hosted method would be our recommendation at the moment. You can take a look at the “Deploy with Helm” docs here: Deploying with Helm - NVIDIA Docs which goes into deploying multinode models like those two.
Let me know if you have any questions or run into any issues!
Thanks for the fast reply.
Are there any other fast solutions for deploying some endpoints for these larger models similar to what’s available in HuggingFace for the smaller models? It would be more ideal to have something closer to “pay-as-you-go” endpoints than configuring multi-gpu clusters with tons of storage on a cloud provider…
@neal.vaidya
@calexiuk
I would appreciate if we could buy more credits as well.