Modulus dependent on specific PyTorch git commit, instead of available releases

In setting up a bare-metal modulus environment, I’m presented with warnings about TorchScript being unavailable because of unsupported PyTorch version

Installed PyTorch version 1.13.0+cu117 is not TorchScript supported in Modulus. Version 1.13.0a0+d321be6 is officially supported.

This seems rather (unnecessarily) specific, and impossible to meet - short of building pytorch from source at the git commit point.

Can the version checking be relaxed to match available pytorch cuda wheels in pip?
Looking at the pytorch container history in ngc, it seems like they’re all built on obtuse commit points:

And I had the same experience with the 22.03 Modulus release.
Warnings about not having PyTorch1.12.0a0+2c916ef despite installing 1.12.0+cu116 from pip.

1 Like

Hi @bsarkar

This commit is relates to the PyTorch commit used in the NGC PyTorch docker container we used as the base for building the Modulus image as I think you identified. The torch script requirement is there because presently we primarily serve Modulus through the docker image not, bare-metal, meaning that’s where our tests and performance analysis occurs.

This requirement can likely be relaxed (warning can probably be ignored in most cases), but we need to set up the appropriate testing. Hopefully we will be able to do this in the near future. If you don’t want that requirement you should to just override it with setting jit: true in your config files which I think you’re doing already.

Hi @ngeneva,

Thanks for the reply.
I’ve actually been setting jit: false when I see that warning - I hadn’t thought to try forcing on, will give it a go now.

My main surprise is that the PyTorch containers/blobs are being built on commit points in the first place - instead of the released tags - I’d expect the releases themselves to have a bit more reliability due to pytorch’s own CI/testing.

Hi @bsarkar

Yes, the warning shows. If you want to comment that our its found in the trainer file.

Yeah, the commit point is a little odd. I believe this is partially related to some internal work that occurs on the NV side to optimize/secure the container for deployment. Regardless, JIT has been a bit of a mixed bag for us over the past few months, so we put some hard constraints if Modulus will use it be default or not (user can always over ride).