I have a problem, my tasks crashes when using several computing nodes. The problem is the same as described here: VASP support site: Forums / Bugreports / [SOLVED] VASP 5 crashes when using several computing nodes (large memory) http://cms.mpi.univie.ac.at/vasp-forum/forum_viewtopic.php?3.12037 . The offered solution is to change memory limits, but mlx4 driver on our cluster doesn’t have “log_mtts_per_seg” parameter. Can I change maximum amount of registerable memory using this driver? Or the only way is to update OFED to version 1.5?
Unfortunately, mlx4_core kernel module of OFED-1.3 doesn’t have this parameter. As far as I know it appeared only in OFED-1.5. So modifing modprob.conf with options log_mtts_per_seg cause an error and the driver doesn’t launch.
Is there a way to change maximum amount of registerable memory in old mlx4 driver?
Unfortunately, OFED 1.5 doesn’t support our OS (SLES 10 sp1) and we can’t upgrade it. It seems that we need to install other linux system on our cluster…