I’ve got a program that runs on multiple GPUs, but would like to use the outputted model for transfer learning in a similar geometric case (different enough that I believe the parametric approach won’t be accurate).
When running the modified geometry I include the outputted model “flow_network.0.pth” in an otherwise empty outputs folder. When examining the output, I can see that the model is loaded:
e[32mSuccess loading model: e[0moutputs/FreeStream/flow_network.0.pth
However, I can also see that it is not loading the model for other GPUs:
model flow_network.1.pth not found
When including the model, is the best practice to simply copy the outputted and saved model to however many GPUs you are planning to use? Meaning, ‘flow_network.0.pth’ gets copied to **.1.pth, **.2.pth, … **.n.pth.
I’ve been operating on the assumption that the model weights are combined at the end of training and that is the reason that there is only one model saved. If that is not the case, is there a reason that I am only seeing one final outputted model when running in parallel? At the moment I run on 8 GPUs, but only ‘flow_network.0.pth’ is saved at the end.