MNPH29D - Driver build problem

I am getting a failure when I attempt to build the source driver on Linux Red Hat Enterprise Server 6.4. Is there support for RH 6.4 or is there something else I should be doing? There are no LED’s flashing at the back of the server. Do the cards need to be attached to a switch for LED’s to light up? I am new to this, but want to setup a high performance cluster consisting of 4 servers.

I should mention I see this in hardware query.

15:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)

Finding Provides: /usr/lib/rpm/redhat/find-provides

error: line 204: Illegal char ‘(’ in: - Modified spec file to conform to KMP specifications

error: Failed to find Provides:

Provides: kernel-modules >= 2.6.32-358.2.1.el6.x86_64 mellanox-mlnx-en-kmod = 1.5.10-1.rhel6u4 modalias(pci:v000015B3d00001000svsdbcsci*) = 1.0-mlnx_ofed1.5.3

I have attached the full log file.

I appreciate your help, DH

install-mlx4_en.log.7750.zip (31.1 KB)

Can you try RHEL 6.2 first, and see if it comes up? Not sure we’ve done the QA on 6.4 for our EN driver yet, but I will check in the meantime.

Confirmed, we will support EN driver in 6.4 in our coming 2.0 release, so 6.2 and possibly 6.3 are supported as of today.

Thanks for the response. I can’t really re-install the whole operating system as it takes quite a bit of time for post config and other software installation. If there is no other alternative, then I will, but is there any idea whne the release 2.0 will come out?

It might be a little while before it’s GA. I don’t have publicly available release dates yet. Are you working with a sales team that can consult with you directly?

harperd https://community.mellanox.com/s/profile/0051T000008EaffQAC , are you sure you need to build the source driver?

Asking because the “Infiniband Support” group in yum installs a bunch of drivers for Mellanox cards.

Note though, I haven’t personally used MNPH29D cards before, so you could be correct in doing so. Asking just in case though.

Hi There!

as far as i know, the Mellanox drivers aren’t release yet for RH6.4. i will take a few more months before this is officially introduced.

if you must stick to 6.4 at this point my best advise would be to work with the “inbox” drivers (what ever is shipped with the kernel).

RH6.3 is fully supported if you are able to re-install those machines

I think there may be pre-canned drivers with 6.4. Unfortunately, my switch box hasn’t arrived yet, so I can’t tell if everything is working, but when I issue a query command, I get this as one of the line items. This makes me think tha the OS is recognizing the card, but I’m not a Linux expert so I’m not 100% sure.

“15:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)”

Yeah, that line is a good sign. It’s showing the hardware is being found.

When the switch arrives and you have things hooked up, the easy way to try things is probably this:

$ sudo yum groupinstall “Infiniband Support”

In theory that should install working drivers.

Though, as mentioned by the guys above, Mellanox support might not really like that.

It’s pretty easy to remove those packages afterwards though if they want you to try a different approach:

$ sudo yum groupremove “Infiniband Support”

Just saying.

OK, I finally received my switch box. I connected everything and nothing happened. I then did as justinclift suggested below. I rebooted and everything seems to be lit up now. Thank you very much justinclift. I am not sure how to configure to ensure that the servers are using or communicating or how to test. I did find this link but it didn’t work for me

How to install support for Mellanox Infiniband hardware on RHEL6 - Red Hat Customer Portal https://access.redhat.com/site/solutions/301643

justinclift

"

Yeah, that line is a good sign. It’s showing the hardware is being found.

When the switch arrives and you have things hooked up, the easy way to try things is probably this:

$ sudo yum groupinstall “Infiniband Support”

In theory that should install working drivers.

Though, as mentioned by the guys above, Mellanox support might not really like that.

It’s pretty easy to remove those packages afterwards though if they want you to try a different approach:

$ sudo yum groupremove “Infiniband Support”

Just saying. "

Cool.

The commands you’ll probably find most useful right away are:

  • ibv_devinfo
  • ibaddr
  • ibdiagnet
  • ibhosts
  • ibnodes
  • ibping

The config files you’ll probably find most interesting are in the /etc/rdma/ directory.

Out of curiosity, are you wanting to setup IP addressing for the adapters? It’s pretty easy to do.

Justin, I think I don’t have something configured properly. Here’s what I get:

[root@think172 ~]# ibv_devinfo

No IB devices found

[root@think172 ~]# ibaddr

ibwarn: [20819] mad_rpc_open_port: can’t open UMAD port ((null):0)

ibaddr: iberror: failed: Failed to open ‘(null)’ port ‘0’

[root@think172 ~]# ibdiagnet

Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.5.7

-W- Topology file is not specified.

Reports regarding cluster links will use direct routes.

Loading IBDM from: /usr/lib64/ibdm1.5.7

-E- IBIS: No HCA was found on local machine.

Exiting.

[root@think172 ~]# ibnodes

src/query_smp.c:227; can’t open UMAD port ((null):0)

/usr/sbin/ibnetdiscover: iberror: failed: discover failed

src/query_smp.c:227; can’t open UMAD port ((null):0)

/usr/sbin/ibnetdiscover: iberror: failed: discover failed

[root@think172 ~]# iping

bash: iping: command not found

[root@think172 ~]#

I appreciate your help as I am still new to this. Regarding your question for IP addressing, I’m not sure, but I do know that my intent is to setup this machine for finite element analysis using Ansys RSM (remote solve manager) to do this. I can choose to create a cluster from these machines, or leave them as individual machines. Any suggestions?

Thanks, David.

Oops, I gave you bad info there.

Didn’t notice/remember that you’re using EN cards. i.e. Ethernet only ones

I don’t think the Infiniband specific commands work with them.

Guessing that your adapters should probably show up as ethernet cards instead. eg: eth0 eth1 eth2 or similar.

If that’s the case, normal ethernet configuration stuff should work fine.

I have no clue at all about Ansys RSM so no suggestions there.