Assertion error when restarting training

I’m getting an error when reloading models during training phase. Has anyone got any advice?

Thanks,
Ben

1 Like

Hi @mn17b2m

Can you provide some information regarding what version of Modulus you are running. Is this a bare metal install or a Docker image.

I have personally not seen this issue before but based on a related Github issue seems its a bug others are seeing in current PyTorch version that could be related to Cuda Graphs.

Perhaps try shutting off Cuda Graphs with cuda_graphs: False in your config to disable this feature?

Where do I place it in the config file?

I keep getting errors. Have tried it in a few locations.

image

I’m running v22.03 and it is bare metal install on google colab

@mn17b2m

Cuda Graphs is a feature present in 22.07, not 22.03 so its not relevant. Based on that PyTorch issue thread I linked, you may want to try downgrading your PyTorch version. (Seems this is happening for people on PyTorch 1.12). Please have a look there for more information that may be relevant to you.