Joining + Leaving multicast group leads to hanging on ib_close_al

Hi

Operations ib_join_mcast and ib_leave_mcast in my application return success but ib_close_al hangs.

If I do not use ib_join_mcast and ib_leave_mcast than ib_close_al does not hang and the application successfully exits.

Joining was checked to be successfull because it was possible to receive messages sent to the multicast group.

But even if we do not try to receive any messages but just call ib_join_mcast and ib_leave_mcast one after another the application infinitely hangs on ib_close_al.

Are there any suggestions?

Output from join callback:


Callback: ib_pfn_mcast_cb:

status=IB_SUCCESS

error_status=0

h_mcast =0x0000000000340FC0

p_member_rec:

mgid=0xFF12401BFFFF0000AB000000000000;

mlid=49160;

qkey=2843;

pkey=65535;

port_gid=0xFE80000000000000E41D2D03007536D1

Handle from ib_join_mcast :


handleMcast=0x0000000000340FC0

Output from ib_leave_mcast :


Leaving mcast group…0x0000000000340FC0

OK

Actually the application does not infinitely hang on ib_close_al as described above - it timeouts in a minute or so.

Environment:

IB driver:

Mellanox OFED for Windows - WinOF VPI Rev 5.10.50000

Windows Client 8.1

Hardware (output from vstat):

hca_idx=0

uplink={BUS=PCI_E Gen3, SPEED=8.0 Gbps, WIDTH=x8, CAPS=8.0*x8}

MSI-X={ENABLED=1, SUPPORTED=128, GRANTED=6, ALL_MASKED=N}

vendor_id=0x02c9

vendor_part_id=4099

hw_ver=0x0

fw_ver=2.35.5100

PSID=MT_1060110018

node_guid=e41d:2d03:0075:36d0

num_phys_ports=1

port=1

port_guid=e41d:2d03:0075:36d1

port_state=PORT_ACTIVE (4)

link_speed=10.00 Gbps

link_width=4x (2)

rate=40.00 Gbps

real_rate=32.00 Gbps (QDR)

port_phys_state=LINK_UP (5)

active_speed=10.00 Gbps

sm_lid=0x0003

port_lid=0x0001

port_lmc=0x0

transport=IB

max_mtu=4096 (5)

active_mtu=4096 (5)

GID[0]=fe80:0000:0000:0000:e41d:2d03:0075:36d1

Network:

2 desktops with 1 HCA adapter each

OpenSM (its own output):


OpenSM 3.3.11 UMAD

Command Line Arguments:

verbose option -D = 0xb

d level = 0x2

Debug mode: Force Log Flush

Creating new log file

Log file max size is 1 MBytes


OpenSM 3.3.11 UMAD

Entering DISCOVERING state

Using default GUID 0xe41d2d03006f0cb1

Entering MASTER state

SUBNET UP

osm_log: log file exceeds the limit 1048576. Truncating.


Additional info:

Init data for ib_join_mcast

(joinInit - data from config - they are correct because messages were received):

ib_mcast_req_t mcast_req;

memset(&mcast_req, 0, sizeof(mcast_req));

mcast_req.create = 1;

mcast_req.mcast_context = this;

mcast_req.pfn_mcast_cb = &ib_pfn_mcast_cb;

mcast_req.timeout_ms = (uint32_t)-1;

mcast_req.retry_cnt = 3;

mcast_req.flags = IB_FLAGS_SYNC;

mcast_req.port_guid = joinInit.recvPort_.guid_;

mcast_req.pkey_index = 0;

mcast_req.member_rec.mgid = joinInit.mcastGroupGid_;

mcast_req.member_rec.pkey = joinInit.pkey_.net_;

mcast_req.member_rec.qkey = joinInit.qkey_;

mcast_req.member_rec.rate = joinInit.rate_;

mcast_req.member_rec.port_gid = joinInit.recvPort_.gid_;

mcast_req.member_rec.mtu = joinInit.mtu_;

mcast_req.member_rec.tclass = joinInit.serviceLevel_;

mcast_req.member_rec.pkt_life = 0x81;

mcast_req.member_rec.sl_flow_hop = 0;

mcast_req.member_rec.scope_state = 0x01;

mcast_req.member_rec.proxy_join = 0;