Need some assistance in working with NeMo

Hello all,

I’m looking for the best way to install NeMo on my SLURM cluster. The cluster currently has no GPUs—it’s CPU-only—but I plan to upgrade with GPUs later. In the meantime, I’d like to start experimenting with NeMo using small models (<1B parameters) since anything larger may not be practical to run and train.

I’ve been trying to install the NeMo framework on my SLURM cluster but haven’t had much success, even with a simple “Hello, World” program. I want to confirm whether my installation approach is correct.

I looked into using Docker, but my impression is that it’s better suited for a single-computer setup. I tried to use Docker but I think that my own inexperience at using Docker is causing some issues. I am open to using Docker but since I am running a SLURM cluster, is this approach best? Since I need NeMo installed across multiple nodes, I believe the pip installation method might be more appropriate. However, when trying to install nemo-curator via pip, I encountered an error.

For my setup, is there anything specific I need to do to make the pip installation work? Or should I explore Docker further despite using SLURM?

First I want to check if my approach is correct for the SLURM cluster. Should I go with docker or should I do the pip install method. I don’t need the most cutting edge NeMo version right now. But I would like to use NeMo 2.

Thanks!

Hi @phillipmobley2 I’ve moved this post to AI & Data Science, Deep Learning, and will follow up with the NeMo team to get some help here. Thanks for your patience!

1 Like