mlnx-ofed-kernel installation failed!

Hi ,

Im currently working on project that need to use Mellanox Infiniband card for our GPUDirect research. We are trying to setup a machine using the Infiniband card we just received but we face a problem with driver installation where it failed to fully install the “mlnx-ofed-kernel-2.0”. Hope we can get some help on the matter. Below shows the output we getting and some information about our machine.

CHECKING DEVICE AVAILABILITY

gpu1@gpu1-System-Product-Name:~$ lspci -v | grep Mellanox

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

Subsystem: Mellanox Technologies Device 0050

CHECKING UBUNTU VERSION

gpu1@gpu1-System-Product-Name:~/Downloads/MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64$ uname -a

Linux gpu1-System-Product-Name 3.8.0-34-generic #49~precise1-Ubuntu SMP Wed Nov 13 18:05:00 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

LAUCNHING THE INSTALLER

gpu1@gpu1-System-Product-Name:~/Downloads/MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64$ sudo ./mlnxofedinstall

[sudo] password for gpu1:

Log: /tmp/ofed.build.log

This program will install the MLNX_OFED_LINUX package on your machine.

Note that all other Mellanox, OEM, OFED, or Distribution IB packages will be removed.

Do you want to continue?[y/N]:y

Checking SW Requirements…

Checking for old packages…

Removing old packages…

Installing new packages

Installing ofed-scripts-2.0…

Running: /usr/bin/dpkg -i --force-confmiss /home/gpu1/Downloads/MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64/DEBS/ofed-scripts_2.0-1_amd64.deb

Installing mlnx-ofed-kernel-2.0…

Running: /usr/bin/dpkg -i --force-confnew --force-confmiss /home/gpu1/Downloads/MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64/DEBS/mlnx-ofed-kernel-dkms_2.0-OFED.2.0.2.6.9.6.g3a2d7bf_all.deb /home/gpu1/Downloads/MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64/DEBS/mlnx-ofed-kernel-utils_2.0-OFED.2.0.2.6.9.6.g3a2d7bf_amd64.deb

mlnx-ofed-kernel installation failed!

Removing newly installed packages…

Running: /usr/sbin/ofed_uninstall.sh --force

Please inform me if you need me to provide more data about this. attached is the logfile of the installer. Thank you.

NOTES:

  • The output shows here is the second time i ran the installer, the first run of the installer show it install some other package required before proceed to install the ofed-script-2.0 and mlnx-ofed-kernel-2.0
  • The computer is freshly format with ubuntu 12.04. CUDA 5.5 and nvidia driver 331 is also been installed.

Amirul,

MIMOS Software Engineer

ofed.uninstall.log.zip (792 Bytes)

ofed.build.log.zip (5.55 KB)

MOFED 2.0-3.0.0 supports only Ubuntu 12.04. Your version (13.04) will be supported in upcoming release 2.1 (only a few weeks away).

Hi Andre,

quick question, if im trying to use GPUDirect RMDA between two Nvidia Kepler K20c cards in two different machine, which infiniband driver should i use? Currently we are have ConnectX-3 FDR infiniband 40gigE in our machine.

thanks.

Well, not to derail amirulom’s thread, but for me at the moment:

lsb_release -a

No LSB modules are available.

Distributor ID: Ubuntu

Description: Ubuntu 13.04

Release: 13.04

Codename: raring

uname -r

3.8.0-34-generic

Thanks. I saw that it was expected this month. About how soon after Ubuntu 14.04 LTS (April 14) is released can we expect support for it? My goal is to standardize on that for a while (probably dreaming).

Your ofed.build.log points to a problem finding kernel source package for 3.8.0-34-generic.

The README for mlnx-ofed-kernel (see in MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64\SOURCES\mlnx-ofed-kernel_2.0.orig.tar.gz\mlnx-ofed-kernel-2.0) says that kernel subtree is based on kernel.org 3.7:

include/

drivers/

net/

Documentation/

So if kernel subtree for 3.8.0-34 differs from 3.7 it would explain why kernel source package cannot be found during build. Have you tried to use 3.7 kernel?

Also please share /var/lib/dkms/mlnx-ofed-kernel/2.0/build/make.log as it might have more details.

Thanks andre Infrastructure & Networking - NVIDIA Developer Forums !!,

It worked!!.. my installation able to finish succesfully. What i did is that change the kernel version into 3.7.10-raring

Kernel/MainlineBuilds - Ubuntu Wiki Kernel/MainlineBuilds - Ubuntu Wiki - this site show how to install kernel version.

Index of /~kernel-ppa/mainline http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D - this site store all kind of ubuntu kernel version.

It tooks me whole day to figure out how to install/downgrade my kernel version. but thank to google it works. Now im trying to setup and run MPI code to test the infiniband.

You are correct, there is no add_kernel script for Ubuntu as

it is compiled from deb packages for Ubuntu, so the code is always compiled.

Next release MLNX OFED release (2.1) will have support for

following Ubuntu versions: 12.04, 13.04 and 13.10. Which version/kernel

combination you are looking for?

Just throwing this out there, the OFED docs says:

"If your kernel version does not match with any of the offered pre-built RPMs, you can

add your kernel version by using the “mlnx_add_kernel_support.sh” script located under

the docs/ directory."

That said, at least the MLNX_OFED_LINUX-2.0-3.0.0-ubuntu12.04-x86_64.tgz version of the drivers does not have said script. :-( Gonna check the .iso…

Generally, if its stable by Q2 we might have it in Q3.

The GPUDirect RDMA beta release will be available before tomorrow and available on our website; the driver will be installed with the Mellanox OFED 2.1 software, which is a pre-requisite for the installation.

Please check here : http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect http://www.mellanox.com/page/products_dyn?product_family=116&mtag=gpudirect for the latest beta version of GPUDirect RDMA.