What is the current best practice for running PyTorch on the GH200 with package management?
The ideal solution would be to have the PyTorch installation within a conda
environment, but this is not yet available, as mentioned here.
Another alternative is the NGC container, but package installations aren’t persistent when working with the Singularity .sif
format. I can decompose the .sif
into a writable sandbox, i.e. singularity build --sandbox pytorch pytorch.sif
, but I don’t have write access to the site-packages
directory for pip installations. In any case, I would prefer not to continually rebuild between the .sif
after each package installation, and maintaining the sandbox version makes a dent in the file limit of my cluster directory.
A third option would be to pip install
locally the wheels found here, but this doesn’t include, e.g., torchvision
.
Lastly, I tried building PyTorch from source in a new conda
environment, but quickly ran into issues with the build process when using the compiler in the HPC SDK.