Hello Chen,
I apologize for not getting a change to respond earlier.
Regarding your questions:
I have verified that SR-IOV is configured per the documents I available online. This is the output when I reboot the server and look at the dmesg out:
[Sat Aug 24 20:48:42 2019] mlx4_core: Mellanox ConnectX core driver v4.6-1.0.1
[Sat Aug 24 20:48:42 2019] mlx4_core: Initializing 0000:00:10.0
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Detected virtual function - running in slave mode**
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Sending reset**
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Sending vhcr0**
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Requested number of MACs is too much for port 1, reducing to 64**
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: HCA minimum page size:512**
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: Timestamping is not supported in slave mode**
[Sat Aug 24 20:48:42 2019] mlx4_core: device is working in RoCE mode: Roce V1
[Sat Aug 24 20:48:42 2019] mlx4_core: UD QP Gid type is: V1
[Sat Aug 24 20:48:42 2019] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v4.6-1.0.1
[Sat Aug 24 20:48:42 2019] <mlx4_ib> check_flow_steering_support**: Device managed flow steering is unavailable for IB port in multifunction env.**
[Sat Aug 24 20:48:42 2019] <mlx4_ib> mlx4_ib_add: counter index 20 for port 1 allocated 0
[Sat Aug 24 20:48:42 2019] <mlx4_ib> mlx4_ib_add: counter index 21 for port 2 allocated 0
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: mlx4_ib: multi-function enabled**
[Sat Aug 24 20:48:42 2019] mlx4_core 0000:00:10.0**: mlx4_ib: operating in qp1 tunnel mode**
[Sat Aug 24 20:48:42 2019] pps_core: LinuxPPS API ver. 1 registered
[Sat Aug 24 20:48:42 2019] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti giometti@linux.it
[Sat Aug 24 20:48:42 2019] PTP clock support registered
[Sat Aug 24 20:48:42 2019] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.6-1.0.1
[Sat Aug 24 20:48:42 2019] card: mlx4_0, QP: 0xad0, inline size: 120
[Sat Aug 24 20:48:42 2019] card: mlx4_0, QP: 0xad8, inline size: 120
[Sat Aug 24 20:48:42 2019] IPv6: ADDRCONF(NETDEV_UP): ib1: link is not ready
[Sat Aug 24 20:48:42 2019] IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready
[Sat Aug 24 20:49:24 2019] random: crng init done
[Sat Aug 24 20:49:24 2019] random: 7 urandom warning(s) missed due to ratelimiting
[Sat Aug 24 21:16:35 2019] iscsi: registered transport (iser)
[Sat Aug 24 21:16:36 2019] scsi host3: iSCSI Initiator over iSER
Also I’ve be able to successfully run ib_send_bw / ib_read_bw between the initiator and target hosts and running lspci give the following output :
00:10.0 Infiniband controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
Please advise if there are other parameters that I need to check to verify functionality.
Regarding the iSer target, I have been able successfully connect to that target from the same hypervisor.
On the hypervisor, when I run iscsiadm -m session, I get the following :
root@pve:~# iscsiadm -m session
iser: [1] 10.1.1.1:3260,1 iqn.1986-03.com.sun:02:a7488577-9989-40c0-9aaa-e2b23c93b387 (non-flash)
root@pve:~#
Also running lsscsi, it lists the iser target luns :
[13:0:0:0] disk SUN COMSTAR 1.0 /dev/sds
[13:0:0:1] disk SUN COMSTAR 1.0 /dev/sdt
Getting back to the KVM guest, the output of dmesg when executing iscsiadm -m node -l, the only output that get added to /var/messages is :
[Sat Aug 24 21:57:53 2019] scsi host3: iSCSI Initiator over iSER
And there is no other output.
But the error on the login process is as follows :
Logging in to [iface: default, target: iqn.1986-03.com.sun:02:a7488577-9989-40c0-9aaa-e2b23c93b387, portal: 10.1.1.1,3260] (multiple)
iscsiadm: Could not login to [iface: default, target: iqn.1986-03.com.sun:02:a7488577-9989-40c0-9aaa-e2b23c93b387, portal: 10.1.1.1,3260].
iscsiadm: initiator reported error (11 - iSCSI PDU timed out)
iscsiadm: Could not log into all portals
But there are no other messages in /var/messages (dmesg).
I am using ConnectX-3 cards on both hosts. Both cards has firmware 2.42.5000 :
CA ‘mlx4_0’
CA type: MT4100
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Please advise how else I can do to debug this.
Thanks in advance.