Rivermax sdk example code run "CQE error"

Hi expert

I am using rivermax sdk example codes, but get “CQE error” failed

CODE: media_sender.exe(Release-CUDA)
GPU: NVIDIA RTX A4000
CUDA: 12.1
Driver: 531.14
wiindows: 10 rpo 21H2

RUN CMD: .\media_sender.exe -c 1 -a 2 -s .\sdps_samples\sdp_2110-20_narrow_gap_1080p50fps.txt -g 0 -r --max_gpu_freq

/logs as follows*****************/

PS D:\wlx\Rivermax\tests> .\media_sender.exe -c 1 -a 2 -s .\sdps_samples\sdp_2110-20_narrow_gap_1080p50fps.txt -g 0 -r --max_gpu_freq
#############################################

Rivermax SDK version: 1.30.16
Media sender version: 1.30.16
#############################################
Set env variable CUDA_DEVICE_ORDER=PCI_BUS_ID
gpu_device_id = 0
Writing log to default location: C:\Users\enlightv\AppData\Local\Temp\rivermax_0606_181623_3736.log
Created log file: C:\Users\enlightv\AppData\Local\Temp\rivermax_0606_181623_3736.log
[23-06-06 18:16:23.411369] Tid: 001016 info [InitLogger:92] Logger started
[23-06-06 18:16:23.411436] Tid: 001016 info [rmax_init:610] starting Rivermax: SDK version 1.30.16
[23-06-06 18:16:23.412164] Tid: 001016 debug [Clock:31]
[23-06-06 18:16:23.412218] Tid: 001016 debug [SysClock:41]
[23-06-06 18:16:23.412272] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_DISABLE_VIDEO_GROUPING to the value true
[23-06-06 18:16:23.412327] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_VIDEO_PACE_INTERVAL to the value 1000000
[23-06-06 18:16:23.412402] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_OUT_STREAM_SIZE_IN_PKTS to the value 32768
[23-06-06 18:16:23.412478] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_HEADER_STRIDE_SIZE to the value 64
[23-06-06 18:16:23.412542] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_DISABLE_FLOW_ID to the value false
[23-06-06 18:16:23.412615] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_SDP_PARSER_ENABLE_LOGGING to the value true
[23-06-06 18:16:23.412672] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_ENABLE_PTP_HW_RT_CLOCK to the value false
[23-06-06 18:16:23.412740] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_ENABLE_CUDA to the value true
[23-06-06 18:16:23.412796] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_ENABLE_STATISTICS to the value false
[23-06-06 18:16:23.412854] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_ENABLE_API_VERIFICATION to the value false
[23-06-06 18:16:23.412909] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_DISABLE_AUDIO_BUFFERING to the value false
[23-06-06 18:16:23.412968] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_SESSION_MAP_SIZE to the value 2000
[23-06-06 18:16:23.413027] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_SESSION_MAP_SIZE to the value 2000
[23-06-06 18:16:23.413266] Tid: 001016 debug [EventHandlerManager:125]
[23-06-06 18:16:23.413293] Tid: 001016 info [EventHandlerManager:132] will wakeup before frame begin event in 2000000 ns
[23-06-06 18:16:23.413332] Tid: 001016 debug [EventHandlerManagerHigh:259]
[23-06-06 18:16:23.413381] Tid: 001016 debug [start_thread:341] Starting internal thread
[23-06-06 18:16:23.413462] Tid: 001016 debug [rivermax_set_thread_affinity:719] successfully set thread affinity using cpu mask: 0x2, previous mask: 0xff
[23-06-06 18:16:23.413499] Tid: 001016 debug [start_thread:344] Started event handler thread
[23-06-06 18:16:23.413564] Tid: 001016 debug [init_globals:249] Time now is 1686046583413563900
[23-06-06 18:16:23.413732] Tid: 006880 info [print_thread_info:117] High priority internal thread: PID = 3736, thread ID = 6880
[23-06-06 18:16:23.414238] Tid: 001016 debug [load_provider:69] dpcp[0] = 6008006000000 ‘Mellanox ConnectX-6 Dx Adapter’
[23-06-06 18:16:23.414264] Tid: 001016 debug [load_provider:69] dpcp[1] = 6008007000000 ‘Mellanox ConnectX-6 Dx Adapter #2
[23-06-06 18:16:23.414289] Tid: 001016 info [init:37] DPCP/DevX provider was loaded
[23-06-06 18:16:23.461965] Tid: 001016 debug [getAdapterInfo:473] Adapter 以太网 3 vlanId 0 len 6 MAC 04:3f:72:a4:99:90
[23-06-06 18:16:23.465444] Tid: 001016 debug [getAdapterInfo:511] LUID 6008006000000 0x15b3/0x101d dpcp_adapter 0x2051b5b1be0 opened true ret 0
[23-06-06 18:16:23.465551] Tid: 001016 debug [getAdapterInfo:528] IP: 192.168.5.44 VLAN_ID: 0 Serial number: MT2035X03235
[23-06-06 18:16:23.465608] Tid: 001016 debug [getAdapterInfo:530] MTU: 1500 TXlinkSpeed: 100 Gbps RXLinkSpeed:100 Gbps
[23-06-06 18:16:23.465669] Tid: 001016 info [getAdapterInfo:535] Device with IP addr: 192.168.5.44 was added to Device Collection [1]
[23-06-06 18:16:23.465724] Tid: 001016 warning [getAdapterInfo:430] Adapter 以太网 4 luidIdx 0x8007 is not Up
[23-06-06 18:16:23.468125] Tid: 001016 debug [getAdapterInfo:473] Adapter 以太网 4 vlanId 0 len 6 MAC 04:3f:72:a4:99:91
[23-06-06 18:16:23.471336] Tid: 001016 debug [getAdapterInfo:511] LUID 6008007000000 0x15b3/0x101d dpcp_adapter 0x2051b5b1cc0 opened true ret 0
[23-06-06 18:16:23.471519] Tid: 001016 debug [getAdapterInfo:528] IP: 169.254.90.23 VLAN_ID: 0 Serial number: MT2035X03235
[23-06-06 18:16:23.471575] Tid: 001016 debug [getAdapterInfo:530] MTU: 1500 TXlinkSpeed: 18446744073 Gbps RXLinkSpeed:18446744073 Gbps
[23-06-06 18:16:23.471633] Tid: 001016 info [getAdapterInfo:535] Device with IP addr: 169.254.90.23 was added to Device Collection [2]
[23-06-06 18:16:23.474342] Tid: 001016 debug [getAdapterInfo:473] Adapter 以太网 2 vlanId 0 len 6 MAC d4:5d:64:d2:c1:49
[23-06-06 18:16:23.474400] Tid: 001016 debug [getAdapterInfo:515] DPCP device with LUID 6008005000000 not found!
[23-06-06 18:16:23.474454] Tid: 001016 info [~winDevice:97] ~winDevice DTOR
[23-06-06 18:16:23.477058] Tid: 001016 debug [getAdapterInfo:473] Adapter Loopback Pseudo-Interface 1 vlanId 0 len 0 MAC 00:00:00:00:00:00
[23-06-06 18:16:23.481534] Tid: 001016 debug [GetPhysicalAdapterByMAC:358] No physical device found, bypassing
[23-06-06 18:16:23.481591] Tid: 001016 debug [getAdapterInfo:486] Physical adapter GUID wasn’t found, bypassing
[23-06-06 18:16:23.481648] Tid: 001016 info [~winDevice:97] ~winDevice DTOR
[23-06-06 18:16:23.505503] Tid: 001016 info [license_validate_v4:446] Licensed to: Beijing Enlightv Co., Ltd (N/A), evaluation period expires in 24 days
[23-06-06 18:16:23.505603] Tid: 001016 info [info_product:466] Rivermax license version: 4.1
[23-06-06 18:16:23.506273] Tid: 001016 info [license_validate:516] Rivermax license id 827d7712-80a4-1938-6474-902c070f7f24, revision 1
[23-06-06 18:16:23.506337] Tid: 001016 info [rmax_init:638] Statistics disabled
[23-06-06 18:16:23.506408] Tid: 001016 info [cuda_enable_etbl:362] Starting Cuda init
[23-06-06 18:16:23.506499] Tid: 001016 info [cuda_enable_etbl:396] Cuda init Done
List of supported devices:
Device with interface name: 以太网 3, IP addresses: [ 192.168.5.44 ], MAC address: 04:3f:72:a4:99:90, device_id: 4125, serial number: MT2035X03235
Device with interface name: 以太网 4, IP addresses: [ 169.254.90.23 ], MAC address: 04:3f:72:a4:99:91, device_id: 4125, serial number: MT2035X03235
[23-06-06 18:16:23.508224] Tid: 001016 debug [Clock:31]
[23-06-06 18:16:23.508254] Tid: 001016 debug [ExternalClock:66]
[23-06-06 18:16:23.508275] Tid: 001016 debug [~SysClock:46]
[23-06-06 18:16:23.508298] Tid: 001016 debug [~Clock:36]
[23-06-06 18:16:23.508321] Tid: 001016 debug [rmx_use_user_clock_v1:324] Using user time handler
TX Thread: 0 Mask: 0x4
@@@cudaAllocateMmap:0 size:25165824 align:0
CUDA memory allocation on GPU - cuMemCreate
RDMA is supported and enabled, status
CUDA memory allocation on GPU - cuMemCreate Done
GPU allocation succeeded, GPU id = 0 ,size = 25165824
Note: Allocation using huge pages size requested 1105920 is smaller then one page size: 2097152
Allocated 2097152 bytes using Large Pages
sdp for stream 0 is:
v=0
o=- 1443716955 1443716955 IN IP4 192.168.5.44
s=SMPTE ST2110-20 narrow gap 1080p50
t=0 0
m=video 2000 RTP/AVP 96
c=IN IP4 224.1.1.1/64
a=source-filter: incl IN IP4 224.1.1.1 192.168.5.44
a=rtpmap:96 raw/90000
a=fmtp:96 sampling=YCbCr-4:2:2; width=1920; height=1080; exactframerate=50; depth=10; TCS=SDR; colorimetry=BT709; PM=2110GPM; SSN=ST2110-20:2017; TP=2110TPN; TSMODE=SAMP; TSDELAY=0
a=mediaclk:direct=0
a=ts-refclk:localmac=40-a3-6b-a0-2b-d2

[23-06-06 18:16:23.581104] Tid: 001016 info [init_large_pages:35] huge pages are supported with page size 2097152
[23-06-06 18:16:23.581143] Tid: 001016 debug [hugePageAlloc:67] allocted 2097152 memory at 0x20532a00000 factor 1 allocSize 2097152
[23-06-06 18:16:23.581172] Tid: 001016 debug [rivermax_get_user_env:151] parsed env RIVERMAX_ENABLE_MP_WQE to the value false
[23-06-06 18:16:23.581196] Tid: 001016 debug [SessionTX:86] MP_WQE disabled for session
[23-06-06 18:16:23.581230] Tid: 001016 info [sdp_parse:594] trying to parse using smpte2110…
[23-06-06 18:16:23.581280] Tid: 001016 info [sdp_parse:610] sdp parsed successfully
[23-06-06 18:16:23.581305] Tid: 001016 info [license_assert_device:719] Validating Rivermax license for device with local ip 192.168.5.44
[23-06-06 18:16:23.581328] Tid: 001016 info [is_sn_matched:144] No serial number restriction
[23-06-06 18:16:23.581349] Tid: 001016 debug [session_tx_initialization:1001] got 4 blocks, 16 stride in chunk, 4320 packets per frame, network_len 46
[23-06-06 18:16:23.581373] Tid: 001016 debug [session_tx_initialization:1024] processing block 0 with 4320 packets
[23-06-06 18:16:23.581393] Tid: 001016 debug [session_tx_initialization:1026] processing application header
[23-06-06 18:16:23.581420] Tid: 001016 debug [session_tx_initialization:1024] processing block 1 with 4320 packets
[23-06-06 18:16:23.581440] Tid: 001016 debug [session_tx_initialization:1026] processing application header
[23-06-06 18:16:23.581467] Tid: 001016 debug [session_tx_initialization:1024] processing block 2 with 4320 packets
[23-06-06 18:16:23.581486] Tid: 001016 debug [session_tx_initialization:1026] processing application header
[23-06-06 18:16:23.581514] Tid: 001016 debug [session_tx_initialization:1024] processing block 3 with 4320 packets
[23-06-06 18:16:23.581535] Tid: 001016 debug [session_tx_initialization:1026] processing application header
[23-06-06 18:16:23.581629] Tid: 001016 debug [session_tx_initialization:1088] fix intv is every 50 frames
[23-06-06 18:16:23.581653] Tid: 001016 info [session_tx_initialization:1166] Detected ST2110-20 video stream
[23-06-06 18:16:23.581678] Tid: 001016 debug [init:85] MP_WQE disabled for ring
[23-06-06 18:16:23.581699] Tid: 001016 info [init:20] do open: true
[23-06-06 18:16:23.581950] Tid: 001016 debug [init:344] cpu_vec 0x0 eqn 7
[23-06-06 18:16:23.581981] Tid: 001016 debug [init:354] Reserved MKey created lkey=0x700 addr=0x2051b5c9e28
[23-06-06 18:16:23.582002] Tid: 001016 debug [init:362] Adapter frequency (khz) 1000000
[23-06-06 18:16:23.582021] Tid: 001016 debug [init:366] DPP supported is enabled
[23-06-06 18:16:23.582040] Tid: 001016 debug [calculate:259] rate 2271600 pps 216000, DI 0.000000, burst size 1262 inter burst gap 4444.44 accurate ibg 4444.44 active_time 0.96 , inter packet_gap 4444.44
[23-06-06 18:16:23.582100] Tid: 001016 debug [create_comp_channel:163] created completion channel: 0x205280e6150 with handle 0x32c
[23-06-06 18:16:23.583416] Tid: 001016 debug [create_cq:207] created CQ sz 32768 cqn 0x434
[23-06-06 18:16:23.583438] Tid: 001016 debug [get_dv_cq:305] CQ id 0x434
[23-06-06 18:16:23.622416] Tid: 001016 debug [create_pp_sq:1056] created packet pacing SQ 0x20519802d50 state SQ_RDY status 0 wqe 0x20532e16000 stride num/sz 32768/64 sq 4350
[23-06-06 18:16:23.622522] Tid: 001016 debug [create_cq_sq:316] got prm cq buf 0x20532c06000 sq buf 0x20532e16000
[23-06-06 18:16:23.623257] Tid: 001016 debug [SenderSG:53] SQ num 0x10fe buf 0x20532e16000 stride 64 cnt 32768 dummyInt 0 extra_dummy 0
[23-06-06 18:16:23.623337] Tid: 001016 debug [Mlx5Poll:34] cq num 0x434 cqe size 64 cq size 32768 cqn 1076 dbrec 0x20527f8ff40
[23-06-06 18:16:23.623914] Tid: 001016 debug [bind:180] called bind to ip 192.168.5.44 port 52894
[23-06-06 18:16:23.624036] Tid: 001016 debug [fill_net_header:103] Final DstMAC=01:00:5e:01:01:01 vlan_id=0
[23-06-06 18:16:23.624096] Tid: 001016 debug [fill_net_header:120] DSCP=0
[23-06-06 18:16:23.624157] Tid: 001016 debug [fill_net_header:122] ECN=0
[23-06-06 18:16:23.624213] Tid: 001016 debug [fill_headers:188] Resolved Src: 192.168.5.44 to SrcMAC=04:3f:72:a4:99:90 Dst: 224.1.1.1 to DstMAC=01:00:5e:01:01:01 VLANId=0
[23-06-06 18:16:23.624294] Tid: 001016 debug [prepare_headers:119] network len 42 max_usr_hdr 64 stride length is 128
[23-06-06 18:16:23.624370] Tid: 001016 debug [hugePageAlloc:67] allocted 4194304 memory at 0x20535000000 factor 2 allocSize 4194304
[23-06-06 18:16:23.624917] Tid: 001016 debug [create_direct_mkey:487] map sz = 2 lkey 0x4948
[23-06-06 18:16:23.624975] Tid: 001016 debug [prepare_headers:140] done preparing raw network header in address 0x20535000000 with size 2211840 lkey 0x4948 total header allocated 17280
[23-06-06 18:16:23.625052] Tid: 001016 debug [session_tx_initialization:1415] calculated 180 DI in gap, one extra dummy every 1.7053e-13 frames
[23-06-06 18:16:23.625124] Tid: 001016 debug [ChunkMgr:44] creating chunkmgr mem_block_array_len: 4 m_chunk_size_in_stride: 16 data_stride_size: 1280, app header 64
[23-06-06 18:16:23.625479] Tid: 001016 debug [create_direct_mkey:487] map sz = 3 lkey 0x4a49
[23-06-06 18:16:23.626055] Tid: 001016 debug [create_direct_mkey:487] map sz = 4 lkey 0x4b4a
[23-06-06 18:16:23.626749] Tid: 001016 debug [create_direct_mkey:487] map sz = 5 lkey 0x4c4b
[23-06-06 18:16:23.627514] Tid: 001016 debug [create_direct_mkey:487] map sz = 6 lkey 0x4d4c
[23-06-06 18:16:23.627799] Tid: 001016 info [disable_mp_wqe:104] MP WQE disabled for ring 0x2052836b2b0
[23-06-06 18:16:23.627864] Tid: 001016 debug [ChunkMgr:254] MP_WQE disabled for ring 0
[23-06-06 18:16:23.627892] Tid: 001016 debug [add_tx_session_to_map:36] created new TX session with id 0
[23-06-06 18:16:23.627967] Tid: 001016 debug [add_tx_session_to_map:40] adding session 0 to map with period 2e+07
Stream ID: 0
Source: 192.168.5.44:52894
Destination: 224.1.1.1:2000
Successfully set thread affinity using cpu mask: 0x4, previous mask: 0xff
running 1 streams, each mem_block using 270 chunks, each frame has 270 chunks, each chunk has 16 strides, sending 4320 packets per frame, 50 frames per second, frame/field duration: 20000 [us]
running scenario with: chunks in frame: 270 chunks in mem_block: 270 strides in chunk : 16 first commit in ms: 1010.941440
[23-06-06 18:16:24.653137] Tid: 009628 error [poll:55] idx 1 wqe id 0 CQE error, vendor syndrome=0x51, HW syndrome=0x2, HW syndrome type=0x0 syndrome=0x4
[23-06-06 18:16:24.653264] Tid: 009628 error [poll:57] send_code 0xfe wqe_cnt 0 user_idx 0x434

Save Edit
Close

Dear @wanglx

Thanks for your post.

It looks like a GPU problem. Unfortunately I can’t provide you more details right now, but I’ll do it later once I have updates.

As an alternative option you can open the case in Enterprise Support and we’ll do our best to track and debug the issue.

Regards,
Vladislav

Dear vkhomyakov

case1 is ok, case 2 is still failed, case1 and case2 have the same GPU/NIC, the difference are mainboard and cpu


case 1:
PC: WS X299 SAGE/INTEL I9
GPU:RTX A6000ADA
NIC:MCX623106AN-CDA_Ax

case 2:
PC: Dell Precison7865/AMD5995WX
GPU:RTXA6000ADA
NIC:MCX623106AN-CDA_Ax