How to Deploy and Run an LLM Designed with the 'NVIDIA NeMo Framework' and 'NVIDIA Megatron'

Hi folks!

I want to deploy this LLM(Link) on a GPU-enabled Azure virtual machine. I created a GPU VM with an NVIDIA A100 GPU and selected the NVIDIA GPU-Optimized VMI(Link) as my OS image.

The model on Hugging Face has two versions, ‘base’ and ‘large’. I want to deploy the large version. The Hugging Face README contains a section, ‘Usage for Large Model Version,’ which refers to the ‘megatron_t5_seq2seq_eval.py’ and ‘megatron_t5_seq2seq_finetune.py’ scripts hosted on the NeMo GitHub repository.

My question is, since I used ‘NVIDIA GPU-Optimized VMI’ as my OS image, which includes multiple components from NVIDIA, what specific components and configurations are necessary to prepare the environment for this LLM? As the model lacks detailed instructions, I am unsure how to prepare the prerequisites.

Thank you!

Hi @KindnessCUDA ,

In order to deploy the large version of NACH0, please make sure the NeMo framework and dependencies are installed: GitHub - NVIDIA/NeMo: A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Once it is installed, you can follow the steps to run the https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/megatron_t5_seq2seq_eval.py script, for example:

# Clone NeMo repository
git clone https://github.com/NVIDIA/NeMo
cd NeMo/examples/nlp/language_modeling

# Run inference (example)
python megatron_t5_seq2seq_eval.py \
  --config-path=conf \
  --config-name=megatron_t5_config_finetune \
  +trainer.devices=4 \
  +model.restore_from_path=/path/to/nach0_large.nemo
2 Likes

Thank you @nilkanthp!
I sincerely appreciate your help 🙏

  • Based on this document, I’ve run this command to install the NeMo Docker container:
    $ docker pull nvcr.io/nvidia/nemo:24.12.01
    Q1: I am wondering if it contains all the required dependencies or if there are additional things that should be installed?

  • Then I cloned the model repository from Hugging Face into the VM.

  • After cloning the model repository, I ran the following commands:

docker run --gpus all --rm \
     -it \
     -v $(pwd):/workspace \
     -w /workspace \
     nvcr.io/nvidia/nemo:24.12.01 \
     bash
  • Now I have access to the model repository files from the Docker bash inside the container, as it is mounted to the host.
    Q2: Should I clone the entire NVIDIA/NeMo repository to the VM from GitHub as well, or is downloading the ‘megatron_t5_seq2seq_eval.py’ file to the VM sufficient?

  • Also, I found this post that mentioned these five parameters should be modified in the corresponding config file:

    1. src_file_name
    2. tgt_file_name
    3. restore_from_path
    4. write_predictions_to_file
    5. output_file_path_prefix

Q3: Please correct me if I have a misunderstanding about these parameters:

  1. The src_file_name should be a full path to a .txt file that contains our prompts.
  2. The tgt_file_name should be a full path to an empty .txt file.
  3. The restore_from_path should be the full path to the .nemo file, which exists in the cloned directory of the Hugging Face repository.
  4. The write_predictions_to_file is a variable that accepts a True or False value.
  5. The output_file_path_prefix parameter determines the file format of the output file, such as .txt or another format.

Q4: I’m not sure exactly what “corresponding config file” means in this case. Does it refer to ‘megatron_t5_seq2seq_eval.py’ or ‘tokenizer_config.json’ or something else?

When I search for those names within the ‘megatron_t5_seq2seq_eval.py’ file, I can find them, but I’m not sure where exactly I should input my own values, such as the full paths to the input and output files:

Thanks!

Hi @KindnessCUDA ,

Regarding the configuration you shared, it should have the required dependencies; still, please clone the NeMo repo (as mentioned previously) and locate the ‘megatron_t5_seq2seq_eval.py’ to run it.
To set up the required file paths/configs, please ensure you have a config file with the parameters you have. Here’s just an example of a yaml config file:

model:
  restore_from_path: /path/to/nach0_large.nemo  # NeMo checkpoint
  tensor_model_parallel_size: 1
  pipeline_model_parallel_size: 1

data:
  src_file_name: /path/to/input_prompts.txt
  tgt_file_name: /path/to/target_responses.txt  # Optional for inference

infer:
  output_file_path_prefix: /output/path/predictions
  write_predictions_to_file: True

Then, you can execute as:

python megatron_t5_seq2seq_eval.py \
    --config-path=<PATH_TO_CONFIG_DIR> \
    --config-name=<CONFIG_FILENAME>.yaml