I have tested a variety of parallel pytorch programs on many machines, many of which will crash, but some codes will not crash on NGC 22.12-py3
The following are the versions of the driver, cuda, pytorch I have tested
Driver version:525.25.60.13/525.78.01
cuda versin:11.6/11.8/12.0
pytorch version:1.14/2.0/The master version of pytorch compiled with cuda12
I’m not sure if this is a driver or library related issue