@kim.dang check your interfaces to have transport set InfiniBand and link_layer to Ethernet
For the interface with assigned IP (the one used for testing) run ibv_devinfo -d rocep1s0f0 then post the output. It should look like this:
elsaco@spark2:~$ ibv_devinfo -d rocep1s0f0
hca_id: rocep1s0f0
transport: InfiniBand (0)
fw_ver: 28.45.4028
node_guid: 4cbb:4703:002d:a85d
sys_image_guid: 4cbb:4703:002d:a85d
vendor_id: 0x02c9
vendor_part_id: 4129
hw_ver: 0x0
board_id: NVD0000000087
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
Also, why not use the NCCL_SOCKET_IFNAME instead of NCCL_HB_HCA, like in the connection test playbook?
From Environment Variables — NCCL 2.29.1 documentation :
The NCCL_SOCKET_IFNAME variable specifies which IP interfaces to use for communication