Can't start fresh UFM install

Hey there!
I followed the manual as best I could on a fresh CentOS 7.9 machine, and I’m running into a weird error:

[user@host]$ sudo /etc/init.d/ufmd start
ufmd monitor
Not valid IP address for fabric interface ib0.  Exit

I have a case number open with Mellanox support, but I’d like to see if anyone here knows the answer and to document it for the public once we figure it out. Google has zero results for me.

Thanks!
-Derek

Hello Derek,

Thank you for posting your inquiry on the NVIDIA Developer Forum - Infrastructure and Networking - Section.

Based on the information provided, this is a common configuration issue. You need to assign a valid IP address (IPoIB) on the IB interface connected to the IB fabric.

You can do this manually with the standard Linux commands ‘ifconfig’ and ‘ip addr add’ or to make it persistent through reboot, create an interface configuration file.

When a valid IP address is assigned to ib0, you will be able to start UFM properly.

In the following link → UFM Software Installation Prerequisites - UFM Enterprise UM v6.8 - NVIDIA Networking Docs
it mentions this as a prerequisite.

Thank you and regards,
~NVIDIA Networking Technical Support

Hey MvB!
Thank you for the reply!

Do we need to define an IP address for ib0 even if we’re not using IPoIB?
The line right above made me think that you could use either ib0 or eth0, my mistake.

How do I know what a valid IP address is for ib0? I checked some of our other servers, but I’m not seeing an IP address assigned for ib0 on them.

Thank you!

I tried applying a garbage IP to ib0 (10.254.254.7), tried to start it again, and it worked!

So looks like you were correct and I should’ve applied “some” IP address to ib0 (which I configured in /etc/sysconfig/network-scripts/ifcfg-ib0 as datagram mode, AKA non-IP…).

On the IP configuration note, I used section 13.8.7 here: 13.8. Configuring IPoIB Red Hat Enterprise Linux 7 | Red Hat Customer Portal

The file looks like:


BOOTPROTO=none

DEVICE=ib0

IPADDR=10.254.254.7

MTU=65520

NETWORK=10.254.254.0

BROADCAST=10.254.254.255

PREFIX=24

ONBOOT=yes

STARTMODE=auto

TYPE=InfiniBand

USERCTL=no

CONNECTED_MODE=no

Launched fine after that and I can access it via IP/ufm_web, and it’s seeing a lot of IB switches and nodes! Not sure if anything is missing yet, but looking good so far!

Thank you again for the help!

Hi Derek,

Good to see that is working as designed. Yes, even if you do not use IPoIB, UFM needs to have this configured, else it would not start.

As you mentioned you have a support case open with us, I found it in the system but it was closed due to the fact that no support entitlement was found. Support for UFM can only be provided through a valid support entitlement. As your customer has a valid license, next time mention when opening a ticket the end-customers name so we can locate the support entitlement automatically, and we will support you through the support ticket.

Cheers,.
~Martijn

Hey Martijn,
Likewise, I’m glad to see it working!
Though I would really like to see an update to the documentation to explain what to do, ideally with an example configuration file. It’s very hard conceptually to understand exactly what to do when it doesn’t offer any details nor example for this step.
In my mind and in the gv.cfg file, I kept going back over, “Why does it need an IP for native/datagram mode? That makes no sense. Even if it does require an IP, I have no idea what subnet it should be in or where it should be defined…”

But to double check, it doesn’t require any specific IP, just “something” defined in the system network interface file for the InfiniBand interface?

Ok will do! Thank you for the tip!
One note on that subject, the only replies I ever got from submitting that ticket were “Additional Information required” emails. I clicked the contained link and filled out the information three times but the system never seemed to acknowledge it that way, looked like a bug in that system because after submitting it would complain that I wasn’t logged into Force.com (even if I were logged into https://support.mellanox.com ).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.