hello expert
I create one simple client and server with rdma rc, client send simple request, and server do simple process and build one reply and send reply to client, during wait server send reply back to client, the latency is super high when we wait reply sent to client with rdma_get_send_comp.
please notice, we use only one thread for for client and server, client only send request sync one by one. why rdma_get_send_comp blocked so long?
root@smcx12svr01:/mnt/nvme/wayne/rdmaclientserverwithep/build# ./rdma_server_folly
[INFO] === Simplified RDMA Server with C++ Thread Pool and Folly Fast Map ===
[INFO] Configuration:
[INFO] Port: 8080
[INFO] Buffer Pool Size: 8
[INFO] Max Connections: 1000
[INFO] Worker Threads: 152
[INFO] Simplified RDMA Server initialized with 152 max worker threads, buffer pool size 8
[INFO] === Simplified RDMA Server Initialization ===
[INFO] Step 1: Setting up RDMA listening endpoint...
[INFO] Step 1.1: Resolving server address 192.168.100.150:8080
[DEBUG] Address resolution hints configured:
[DEBUG] ai_flags: RAI_PASSIVE (server mode)
[DEBUG] ai_port_space: RDMA_PS_TCP
[DEBUG] ai_family: AF_INET (IPv4)
[DEBUG] Calling rdma_getaddrinfo with port: 8080
[SUCCESS] Server address resolved successfully
[INFO] Step 1.2: Creating listening endpoint...
[DEBUG] QP initialization attributes configured:
[DEBUG] max_send_wr: 8
[DEBUG] max_recv_wr: 8
[DEBUG] max_send_sge: 1
[DEBUG] max_recv_sge: 1
[DEBUG] max_inline_data: 256
[DEBUG] sq_sig_all: 1
[DEBUG] Calling rdma_create_ep to create listening endpoint...
[SUCCESS] Listening endpoint created successfully
[DEBUG] Listening endpoint details:
[DEBUG] listen_id: 0x5f70f0f27230
[DEBUG] verbs: available
[DEBUG] Device: mlx5_1
[INFO] Step 1.3: Starting to listen for connections...
[DEBUG] Calling rdma_listen with backlog: 10
[SUCCESS] Simplified RDMA Server successfully initialized and listening on 192.168.100.150:8080
[INFO] Server ready to accept client connections with pre-allocated resources
[INFO] Configuration summary:
[INFO] Buffer Pool Size: 8 buffers per connection
[INFO] Max Connections: 1000
[INFO] Worker Threads: 152 (max)
[INFO] Pre-allocated Resources: Enabled
[INFO] Receive SQE Recycling: Enabled
[INFO] === Simplified RDMA Server Starting ===
[INFO] Entering main accept loop for client connections...
[DEBUG] Waiting for incoming connection request...
[INFO] Client 1 connection request received
[DEBUG] Client connection details:
[DEBUG] client_id: 0x5f70f0f27dc0
[DEBUG] verbs: available
[DEBUG] Device: mlx5_1
[DEBUG] Calling rdma_accept to establish connection...
[SUCCESS] Client 1 connection accepted successfully
[DEBUG] Pre-allocating resources for client 1 with buffer pool size 8
[INFO] ConnectionResources initialized for connection 1 with pool size 8
[DEBUG] Initializing buffer pools for client 1...
[DEBUG] Buffer pools initialized with 8 buffers each
[DEBUG] Registering memory regions for client 1...
[INFO] Starting memory region registration for 8 buffers per type
[INFO] Registered send buffer 0 memory region
[INFO] Registered send buffer 1 memory region
[INFO] Registered send buffer 2 memory region
[INFO] Registered send buffer 3 memory region
[INFO] Registered send buffer 4 memory region
[INFO] Registered send buffer 5 memory region
[INFO] Registered send buffer 6 memory region
[INFO] Registered send buffer 7 memory region
[INFO] Registered receive buffer 0 memory region
[INFO] Registered receive buffer 1 memory region
[INFO] Registered receive buffer 2 memory region
[INFO] Registered receive buffer 3 memory region
[INFO] Registered receive buffer 4 memory region
[INFO] Registered receive buffer 5 memory region
[INFO] Registered receive buffer 6 memory region
[INFO] Registered receive buffer 7 memory region
[INFO] Successfully initialized 8 send and 8 receive memory regions
[SUCCESS] Pre-allocated 8 buffers and memory regions for client 1
[DEBUG] Storing connection resources for client 1...
[DEBUG] Connection resources stored successfully
[DEBUG] Creating dedicated worker thread for client 1...
[SUCCESS] Client 1 assigned to dedicated thread 129963178554944
[DEBUG] Waiting for incoming connection request...
[INFO] Worker thread 129963178554944 handling client 1
[INFO] Client 1 ready for high-performance request processing with SQE recycling
[INFO] Allocated receive buffer 0 (available: 7/8)
[INFO] Pre-posted receive SQE 0 for client 1
[INFO] Allocated receive buffer 1 (available: 6/8)
[INFO] Pre-posted receive SQE 1 for client 1
[INFO] Allocated receive buffer 2 (available: 5/8)
[INFO] Pre-posted receive SQE 2 for client 1
[INFO] Allocated receive buffer 3 (available: 4/8)
[INFO] Pre-posted receive SQE 3 for client 1
[INFO] Allocated receive buffer 4 (available: 3/8)
[INFO] Pre-posted receive SQE 4 for client 1
[INFO] Allocated receive buffer 5 (available: 2/8)
[INFO] Pre-posted receive SQE 5 for client 1
[INFO] Allocated receive buffer 6 (available: 1/8)
[INFO] Pre-posted receive SQE 6 for client 1
[INFO] Allocated receive buffer 7 (available: 8/8)
[INFO] Pre-posted receive SQE 7 for client 1
[INFO] Pre-posted 8 receive SQEs for client 1
[DEBUG] Client 1 buffer pools: send_buffer_pool.size()=8 recv_buffer_pool.size()=8 send_mr_pool.size()=8 recv_mr_pool.size()=8
[DEBUG] Received message from client 1 - Type: 1, Hash ID: 'test_hash_0', GDS Path: '/test/path/0', Result: 0
[INFO] Client 1 stored mapping: 'test_hash_0' -> '/test/path/0'
[INFO] Allocated send buffer 0 (available: 7/8)
[DEBUG] High latency breakdown for client 1 (total: 162.918 μs):
[DEBUG] Step 1 (buffer validation): 0.178 μs
[DEBUG] Step 2 (string processing): 0.997 μs
[DEBUG] Step 3 (storage lookup): 29.995 μs
[DEBUG] Step 4 (send buffer allocation): 10.18 μs
[DEBUG] Step 5 (response preparation): 0.918 μs
[DEBUG] Step 6 (rdma_post_send): 4.372 μs
[DEBUG] Step 7 (rdma_get_send_comp): 116.17 μs
[INFO] Successfully processed PUT request for client 1 in 162.918 μs
[INFO] Returned send buffer to pool (available: 7/8)
[INFO] Returned receive buffer to pool (available: 8/8)
[DEBUG] Received message from client 1 - Type: 1, Hash ID: 'test_hash_1', GDS Path: '/test/path/1', Result: 0
[INFO] Client 1 stored mapping: 'test_hash_1' -> '/test/path/1'
[INFO] Allocated send buffer 1 (available: 6/8)
[DEBUG] High latency breakdown for client 1 (total: 2.14712e+06 μs):
[DEBUG] Step 1 (buffer validation): 0.075 μs
[DEBUG] Step 2 (string processing): 0.365 μs
[DEBUG] Step 3 (storage lookup): 1.935 μs
[DEBUG] Step 4 (send buffer allocation): 13.123 μs
[DEBUG] Step 5 (response preparation): 0.522 μs
[DEBUG] Step 6 (rdma_post_send): 0.388 μs
[DEBUG] Step 7 (rdma_get_send_comp): 2.1471e+06 μs
[INFO] Successfully processed PUT request for client 1 in 2.14712e+06 μs
[INFO] Returned send buffer to pool (available: 6/8)
[INFO] Returned receive buffer to pool (available: 8/8)
[DEBUG] Received message from client 1 - Type: 1, Hash ID: 'test_hash_2', GDS Path: '/test/path/2', Result: 0
[INFO] Client 1 stored mapping: 'test_hash_2' -> '/test/path/2'
[INFO] Allocated send buffer 2 (available: 5/8)
[DEBUG] High latency breakdown for client 1 (total: 65.95 μs):
[DEBUG] Step 1 (buffer validation): 0.118 μs
[DEBUG] Step 2 (string processing): 0.732 μs
[DEBUG] Step 3 (storage lookup): 4.488 μs
[DEBUG] Step 4 (send buffer allocation): 10.23 μs
[DEBUG] Step 5 (response preparation): 0.825 μs
[DEBUG] Step 6 (rdma_post_send): 1.475 μs
[DEBUG] Step 7 (rdma_get_send_comp): 48.01 μs
[INFO] Successfully processed PUT request for client 1 in 65.95 μs
[INFO] Returned send buffer to pool (available: 5/8)
[INFO] Returned receive buffer to pool (available: 8/8)
[DEBUG] Received message from client 1 - Type: 1, Hash ID: 'test_hash_3', GDS Path: '/test/path/3', Result: 0
[INFO] Client 1 stored mapping: 'test_hash_3' -> '/test/path/3'
[INFO] Allocated send buffer 3 (available: 4/8)
[DEBUG] High latency breakdown for client 1 (total: 157.535 μs):
[DEBUG] Step 1 (buffer validation): 0.075 μs
[DEBUG] Step 2 (string processing): 0.122 μs
[DEBUG] Step 3 (storage lookup): 0.83 μs
[DEBUG] Step 4 (send buffer allocation): 7.995 μs
[DEBUG] Step 5 (response preparation): 0.393 μs
[DEBUG] Step 6 (rdma_post_send): 0.367 μs
[DEBUG] Step 7 (rdma_get_send_comp): 147.68 μs
[INFO] Successfully processed PUT request for client 1 in 157.535 μs
[INFO] Returned send buffer to pool (available: 4/8)
[INFO] Returned receive buffer to pool (available: 8/8)
[DEBUG] Received message from client 1 - Type: 1, Hash ID: 'test_hash_4', GDS Path: '/test/path/4', Result: 0
[INFO] Client 1 stored mapping: 'test_hash_4' -> '/test/path/4'
[INFO] Allocated send buffer 4 (available: 3/8)
[DEBUG] High latency breakdown for client 1 (total: 53.337 μs):
[DEBUG] Step 1 (buffer validation): 0.037 μs
[DEBUG] Step 2 (string processing): 0.061 μs
[DEBUG] Step 3 (storage lookup): 0.369 μs
[DEBUG] Step 4 (send buffer allocation): 3.965 μs
[DEBUG] Step 5 (response preparation): 0.32 μs
[DEBUG] Step 6 (rdma_post_send): 0.286 μs
[DEBUG] Step 7 (rdma_get_send_comp): 48.263 μs
[INFO] Successfully processed PUT request for client 1 in 53.337 μs
[INFO] Returned send buffer to pool (available: 3/8)
[INFO] Returned receive buffer to pool (available: 8/8)