Hi folks!
I want to deploy this LLM(Link) on a GPU-enabled Azure virtual machine. I created a GPU VM with an NVIDIA A100 GPU and selected the NVIDIA GPU-Optimized VMI(Link) as my OS image.
The model on Hugging Face has two versions, ‘base’ and ‘large’. I want to deploy the large version. The Hugging Face README contains a section, ‘Usage for Large Model Version,’ which refers to the ‘megatron_t5_seq2seq_eval.py’ and ‘megatron_t5_seq2seq_finetune.py’ scripts hosted on the NeMo GitHub repository.
My question is, since I used ‘NVIDIA GPU-Optimized VMI’ as my OS image, which includes multiple components from NVIDIA, what specific components and configurations are necessary to prepare the environment for this LLM? As the model lacks detailed instructions, I am unsure how to prepare the prerequisites.
Thank you!
Hi @KindnessCUDA ,
In order to deploy the large
version of NACH0, please make sure the NeMo framework and dependencies are installed: GitHub - NVIDIA/NeMo: A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Once it is installed, you can follow the steps to run the https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py
script, for example:
# Clone NeMo repository
git clone https://github.com/NVIDIA/NeMo
cd NeMo/examples/nlp/language_modeling
# Run inference (example)
python megatron_t5_seq2seq_eval.py \
--config-path=conf \
--config-name=megatron_t5_config_finetune \
+trainer.devices=4 \
+model.restore_from_path=/path/to/nach0_large.nemo
2 Likes
Thank you @nilkanthp!
I sincerely appreciate your help 🙏
-
Based on this document, I’ve run this command to install the NeMo Docker container:
$ docker pull nvcr.io/nvidia/nemo:24.12.01
Q1: I am wondering if it contains all the required dependencies or if there are additional things that should be installed?
-
Then I cloned the model repository from Hugging Face into the VM.
-
After cloning the model repository, I ran the following commands:
docker run --gpus all --rm \
-it \
-v $(pwd):/workspace \
-w /workspace \
nvcr.io/nvidia/nemo:24.12.01 \
bash
-
Now I have access to the model repository files from the Docker bash inside the container, as it is mounted to the host.
Q2: Should I clone the entire NVIDIA/NeMo repository to the VM from GitHub as well, or is downloading the ‘megatron_t5_seq2seq_eval.py’ file to the VM sufficient?
-
Also, I found this post that mentioned these five parameters should be modified in the corresponding config file:
src_file_name
tgt_file_name
restore_from_path
write_predictions_to_file
output_file_path_prefix
Q3: Please correct me if I have a misunderstanding about these parameters:
- The
src_file_name
should be a full path to a .txt
file that contains our prompts.
- The
tgt_file_name
should be a full path to an empty .txt
file.
- The
restore_from_path
should be the full path to the .nemo
file, which exists in the cloned directory of the Hugging Face repository.
- The
write_predictions_to_file
is a variable that accepts a True
or False
value.
- The
output_file_path_prefix
parameter determines the file format of the output file, such as .txt
or another format.
Q4: I’m not sure exactly what “corresponding config file” means in this case. Does it refer to ‘megatron_t5_seq2seq_eval.py’ or ‘tokenizer_config.json’ or something else?
When I search for those names within the ‘megatron_t5_seq2seq_eval.py’ file, I can find them, but I’m not sure where exactly I should input my own values, such as the full paths to the input and output files:
Thanks!
Hi @KindnessCUDA ,
Regarding the configuration you shared, it should have the required dependencies; still, please clone the NeMo repo (as mentioned previously) and locate the ‘megatron_t5_seq2seq_eval.py’ to run it.
To set up the required file paths/configs, please ensure you have a config file with the parameters you have. Here’s just an example of a yaml config file:
model:
restore_from_path: /path/to/nach0_large.nemo # NeMo checkpoint
tensor_model_parallel_size: 1
pipeline_model_parallel_size: 1
data:
src_file_name: /path/to/input_prompts.txt
tgt_file_name: /path/to/target_responses.txt # Optional for inference
infer:
output_file_path_prefix: /output/path/predictions
write_predictions_to_file: True
Then, you can execute as:
python megatron_t5_seq2seq_eval.py \
--config-path=<PATH_TO_CONFIG_DIR> \
--config-name=<CONFIG_FILENAME>.yaml