Hi everyone,
I’ve been working with NVIDIA NeMo to fine-tune a LLaMA 3.1 8B model using the megatron_finetune.py
script. The training went well, and I successfully saved the LoRA weights (.nemo
file) in the results/
directory. Now, I want to use this fine-tuned LoRA model with NVIDIA NIM, but I’m not sure where exactly to place the .nemo
file or how to register it with the NIM server.
Here’s a summary of what I did:
- Fine-Tuning Process:
- Used the
megatron_finetune.py
script in NeMo to fine-tune the LLaMA 3.1 8B model. - The resulting LoRA weights were saved as
megatron_gpt_peft_lora_tuning.nemo
in the directory/workspace/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/
.
- Moving the Model to NIM:
- I copied the
.nemo
file to the NIM container using the command:
docker cp /workspace/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning.nemo nim:/workspace/loras/
- I verified the file was present in the NIM container:
docker exec -it nim ls /workspace/loras
- The output shows that the file is indeed there.
- Setting Up NIM:
- My Docker Compose configuration includes:
services:
nim:
image: nim_custom:v1
ports:
- "8000:8000"
environment:
- NIM_PEFT_SOURCE=/workspace/loras
- NIM_PEFT_REFRESH_INTERVAL=3600
volumes:
- /path/to/loras:/workspace/loras
networks:
- verb-network
- Problem: When I try to use the LoRA model in NIM by making a request, I get a
404
error saying that the modelllama3.1-8b-law-titlegen
does not exist:
url = 'http://0.0.0.0:8000/v1/completions'
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
data = {
"model": "llama3.1-8b-law-titlegen",
"prompt": "Generate a concise, engaging title for the following legal question...",
"max_tokens": 50
}
response = requests.post(url, headers=headers, json=data)
The response I get is:
{
"object": "error",
"message": "The model `llama3.1-8b-law-titlegen` does not exist.",
"type": "NotFoundError",
"param": null,
"code": 404
}
I suspect that I may need to configure or register the .nemo
file in NIM differently. Does anyone know the correct location and method to register the fine-tuned LoRA weights in NIM so that it can recognize and serve the model?
Any help or suggestions would be greatly appreciated!
Scripts and Configuration Snippets:
docker cp
command to transfer the.nemo
file:
docker cp /workspace/results/Meta-llama3.1-8B-Instruct-titlegen/checkpoints/megatron_gpt_peft_lora_tuning.nemo nim:/workspace/loras/
- Docker Compose configuration for NIM:
services:
nim:
image: nim_custom:v1
ports:
- "8000:8000"
environment:
- NIM_PEFT_SOURCE=/workspace/loras
- NIM_PEFT_REFRESH_INTERVAL=3600
volumes:
- /path/to/loras:/workspace/loras
networks:
- verb-network
How should I properly register the fine-tuned .nemo
file so that NIM can serve the model without any errors? Or better yet, in which directory should I place my .nemo
file resulting from the fine-tuning within the NIM container to ensure it is properly loaded and recognized?