Hello all,
I’m looking for the best way to install NeMo on my SLURM cluster. The cluster currently has no GPUs—it’s CPU-only—but I plan to upgrade with GPUs later. In the meantime, I’d like to start experimenting with NeMo using small models (<1B parameters) since anything larger may not be practical to run and train.
I’ve been trying to install the NeMo framework on my SLURM cluster but haven’t had much success, even with a simple “Hello, World” program. I want to confirm whether my installation approach is correct.
I looked into using Docker, but my impression is that it’s better suited for a single-computer setup. I tried to use Docker but I think that my own inexperience at using Docker is causing some issues. I am open to using Docker but since I am running a SLURM cluster, is this approach best? Since I need NeMo installed across multiple nodes, I believe the pip installation method might be more appropriate. However, when trying to install nemo-curator
via pip, I encountered an error.
For my setup, is there anything specific I need to do to make the pip installation work? Or should I explore Docker further despite using SLURM?
First I want to check if my approach is correct for the SLURM cluster. Should I go with docker or should I do the pip install method. I don’t need the most cutting edge NeMo version right now. But I would like to use NeMo 2.
Thanks!