Hi, I am running on my SGE cluster (with Ubuntu 12.04) the following script (using qsub):
#!/bin/bash #$-cwd #$ -S /bin/bash #$ -V #$ -q normal #$ -pe mpi 40 #$ -P Lab219 #$ -o output #$ -e error module load PhyML/3.3 mpirun --mca pml yalla -np 40 phyml-mpi -i proteic -b 10 -d aa
where phyml-mpi is the parallel version for OMPI of the program PhyML. --mca pml yalla option is called to used MXM (I have mellanox OFED).
It gives me lots of errors related to KNEM (see error and output files from qsub in the attachments). However, I specified the KNEM directory when installing OMPI.
/dev/knem is not mounted and, when I try to do it with sudo modprobe knem, it gives me:
FATAL: Error inserting knem (/lib/modules/3.13.0-37-generic/updates/dkms/knem.ko): Invalid module format
Could anyone give ,me any hint on this issue? Should I install, maybe, knem independently from the Knem website and build OMPI with such knem drivers again?
Thanks in advance
output.zip (921 Bytes)
error.zip (1.88 KB)
That doesn’t seems like an OMPI error as this is completely userspace package. There is something with knem module. Most likely it isn’t compiled with currently running kernel. I would suggest to reinstall MOFED and try to run’ modprobe knem’ to check if it loads.
FATAL: Error inserting knem (/lib/modules/3.13.0-37-generic/updates/dkms/knem.ko): Invalid module format indicates that the module is not built to match your kernel, even though it is in the correct dkms directory. Also, I would guess that your MPI is built to use knem as it doesn’t complain about it being missing, and nothing tries to load it if it is not. So, I would download the latest version from the knem site, built it and install it. - Note - read the section of instructions about modifying the udev file. This will be necessary unless everyone is in the RDMA group. Knem does make a difference. It allows for 0 copy transfers withing the system, and doesn’t have the security set-up problems of the other 0 copy options
Here are instructions:
KNEM: Fast Intra-Node MPI Communication http://knem.gforge.inria.fr/doc/
OMPI can use knem module, however it doesn’t care about the compilation. knem is a part of kernel and not a part of OMPI. If any of kernel modules, like knem, cannot be loaded because of wrong symbols the issue should be taken with kernel module developers.
At the same time, you might try to recompile the modules for you kernel and see if it help. This link might be a good start point Command to rebuild all DKMS modules for all installed kernels? - Ask Ubuntu http://askubuntu.com/questions/53364/command-to-rebuild-all-dkms-modules-for-all-installed-kernels