iSer over SR-IOV.

Hello,

I’m running into an issue initiating an iSer connection to a iSer enabled storage server. The connections times out when the initiator tries to log into the target. The iser initiator is a KVM Guest running Ubuntu 18.04 with all latest updates. And my target is a bare metal box also with Ubuntu 18.04 with all updates applied.

I have successfully configured the KVM guest using SR-IOV using this Mellanox Knowledge article :

https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-3-with-kvm–infiniband-x#jive_content_id_Manage_the_VM

I’ve successfully verified RDMA connectivity between the initiator and target using ib_send_bw, udaddy, rdma_server, rdma_client.

Output of running : ib_send_bw -i 2 10.0.X.X


Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet


local address: LID 0x07 QPN 0x0d64 PSN 0xff3d62

remote address: LID 0x04 QPN 0x0219 PSN 0xf01f6d


#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 1000 3718.02 3716.43 0.059463


I’ve also verified that the target is setup as iSer :

/iscsi/iqn.20…0.XX.XX:3260> ls

o- 10.0.XX.XX:3260 … [iser]

Unfortunately, anytime I try to login into the target from the initiator, it just hangs until it timeouts :

Logging in to [iface: default, target: iqn.2003-01.org.setup.lun.test, portal: 10.0.XX.XX,3260] (multiple)

iscsiadm: Could not login to [iface: default, target: iqn.2003-01.org.setup.lun.test, portal: 10.0.XX.X,3260].

iscsiadm: initiator reported error (8 - connection timed out)

iscsiadm: Could not log into all portals

I’ve verified that the initiator works if the transport mode is setup to tcp. It only does not work when it’s set to iser.

I have even verified that ib_iser has been loaded on both servers. I have managed to get iSer connection established from other bare metal initiators to this target. The only difference is that initiator that I’m now using is a Virtual PCIe device in KVM.

I have MLNX_OFED_LINUX-4.6-1.0.1.1 running on both ends.

Can someone please help me identify what I’m missing here ?

Thanks in advance.

Hi Gurvinderpal,

As I have no ability to parse logs and run a comprehensive root cause analysis, and since you have not mention the explicit Mellanox nics & firmware you are using on the initiator & target, then I can only suggest the followings:

  1. Check that the SR-IOV configuration is configured properly and that you haven’t missed some params setting.

  2. Monitor the /var/log/messages & dmesg outputs to understand what the actual connection issue is.

  3. Use Mellanox article to ensure you have configured iSER (ini & tgt) as per Mellanox best practice

https://community.mellanox.com/s/article/howto-configure-tgt-enabled-with-iser-transport-for-ubuntu

Other than that, if you still need some higher debugging level from Mellanox, don’t hesitate to approach support@mellanox.com

Best Regards,

Chen

Hello Chen,

I apologize for not getting a change to respond earlier.

Regarding your questions:

I have verified that SR-IOV is configured per the documents I available online. This is the output when I reboot the server and look at the dmesg out:

[Sat Aug 24 20:48:42 2019] mlx4_core: Mellanox ConnectX core driver v4.6-1.0.1

[Sat Aug 24 20:48:42 2019] mlx4_core: Initializing 0000:00:10.0

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Detected virtual function - running in slave mode**

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Sending reset**

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Sending vhcr0**

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Requested number of MACs is too much for port 1, reducing to 64**

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: HCA minimum page size:512**

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Timestamping is not supported in slave mode**

[Sat Aug 24 20:48:42 2019] mlx4_core: device is working in RoCE mode: Roce V1

[Sat Aug 24 20:48:42 2019] mlx4_core: UD QP Gid type is: V1

[Sat Aug 24 20:48:42 2019] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.6-1.0.1

[Sat Aug 24 20:48:42 2019] <mlx4_ib> check_flow_steering_support**: Device managed flow steering is unavailable for IB port in multifunction env.**

[Sat Aug 24 20:48:42 2019] <mlx4_ib> mlx4_ib_add: counter index 20 for port 1 allocated 0

[Sat Aug 24 20:48:42 2019] <mlx4_ib> mlx4_ib_add: counter index 21 for port 2 allocated 0

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: mlx4_ib: multi-function enabled**

[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: mlx4_ib: operating in qp1 tunnel mode**

[Sat Aug 24 20:48:42 2019] pps_core: LinuxPPS API ver. 1 registered

[Sat Aug 24 20:48:42 2019] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it

[Sat Aug 24 20:48:42 2019] PTP clock support registered

[Sat Aug 24 20:48:42 2019] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.6-1.0.1

[Sat Aug 24 20:48:42 2019] card: mlx4_0, QP: 0xad0, inline size: 120

[Sat Aug 24 20:48:42 2019] card: mlx4_0, QP: 0xad8, inline size: 120

[Sat Aug 24 20:48:42 2019] IPv6: ADDRCONF(NETDEV_UP): ib1: link is not ready

[Sat Aug 24 20:48:42 2019] IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready

[Sat Aug 24 20:49:24 2019] random: crng init done

[Sat Aug 24 20:49:24 2019] random: 7 urandom warning(s) missed due to ratelimiting

[Sat Aug 24 21:16:35 2019] iscsi: registered transport (iser)

[Sat Aug 24 21:16:36 2019] scsi host3: iSCSI Initiator over iSER

Also I’ve be able to successfully run ib_send_bw / ib_read_bw between the initiator and target hosts and running lspci give the following output :

00:10.0 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

Please advise if there are other parameters that I need to check to verify functionality.

Regarding the iSer target, I have been able successfully connect to that target from the same hypervisor.

On the hypervisor, when I run iscsiadm -m session, I get the following :

root@pve:~# iscsiadm -m session

iser: [1] 10.1.1.1:3260,1 iqn.1986-03.com.sun:02:a7488577-9989-40c0-9aaa-e2b23c93b387 (non-flash)

root@pve:~#

Also running lsscsi, it lists the iser target luns :

[13:0:0:0] disk SUN COMSTAR 1.0 /dev/sds

[13:0:0:1] disk SUN COMSTAR 1.0 /dev/sdt

Getting back to the KVM guest, the output of dmesg when executing iscsiadm -m node -l, the only output that get added to /var/messages is :

[Sat Aug 24 21:57:53 2019] scsi host3: iSCSI Initiator over iSER

And there is no other output.

But the error on the login process is as follows :

Logging in to [iface: default, target: iqn.1986-03.com.sun:02:a7488577-9989-40c0-9aaa-e2b23c93b387, portal: 10.1.1.1,3260] (multiple)

iscsiadm: Could not login to [iface: default, target: iqn.1986-03.com.sun:02:a7488577-9989-40c0-9aaa-e2b23c93b387, portal: 10.1.1.1,3260].

iscsiadm: initiator reported error (11 - iSCSI PDU timed out)

iscsiadm: Could not log into all portals

But there are no other messages in /var/messages (dmesg).

I am using ConnectX-3 cards on both hosts. Both cards has firmware 2.42.5000 :

CA ‘mlx4_0’

CA type: MT4100

Number of ports: 2

Firmware version: 2.42.5000

Hardware version: 1

Please advise how else I can do to debug this.

Thanks in advance.