I saw this error at SNAP-4 container in BF3 during NVMe/TCP test. It might be related to the fio block size more that 1M. Is there any related configuration or environment to extend xlio packet and prevent connection failure?
[2025-07-04 02:21:59.475959] nvme_nvda_tcp.c: 751:xlio_sock_get_packet: *WARNING*: Not enough xlio packets, using dynamic allocation. Performance may be degraded
[2025-07-04 02:22:12.436704] bdev_nvme.c:5257:timeout_cb: *WARNING*: [nqn.2025-03.io.spdk:cnode1, 1] Warning: Detected a timeout. ctrlr=0xaaab0100a010 qpair=0x200004a09bc0 cid=12
[2025-07-04 02:22:12.437328] bdev_nvme.c:5257:timeout_cb: *WARNING*: [nqn.2025-03.io.spdk:cnode1, 1] Warning: Detected a timeout. ctrlr=0xaaab0100a010 qpair=0x200004a09bc0 cid=13
Here are configurations that I set to run SNAP-4 with xlio
mlxconfig
$ sudo mlxconfig -d /dev/mst/mt41692_pciconf0 -e q | grep -i \*
* NVME_EMULATION_ENABLE False(0) True(1) True(1)
* NVME_EMULATION_NUM_PF 1 2 2
* NVME_EMULATION_NUM_MSIX 0 64 0
* VIRTIO_BLK_EMULATION_NUM_MSIX 2 0 2
* VIRTIO_FS_EMULATION_NUM_MSIX 2 0 2
* VIRTIO_NET_EMULATION_NUM_MSIX 2 0 2
* PER_PF_NUM_SF False(0) True(1) True(1)
* NVME_EMU_MNG_ENABLE False(0) True(1) False(0)
* NVME_EMU_MNG_NUM_PF 1 2 1
* PF_TOTAL_SF 0 32 2
* PF_SF_BAR_SIZE 0 8 8
The '*' shows parameters with next value different from default/current value.
xlio.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: snap
spec:
hostNetwork: true
containers:
- name: snap
image: nvcr.io/nvidia/doca/doca_snap:4.7.0-doca3.0.0
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
capabilities:
add: ["IPC_LOCK", "SYS_RAWIO", "SYS_NICE"]
volumeMounts:
- mountPath: /dev/hugepages
name: hugepages
- mountPath: /dev/shm
name: shm
- mountPath: /dev/infiniband
name: infiniband
- mountPath: /dev/vfio
name: vfio
- mountPath: /etc/nvda_snap
name: conf
- mountPath: /var/log/snap-log
name: snap-log
resources:
requests:
memory: "4Gi"
cpu: "8"
limits:
hugepages-2Mi: "12Gi"
memory: "16Gi"
cpu: "16"
env:
## To enable XLIO un-comment SPDK_XLIO_PATH
## App-Specific command line arguments
- name: APP_ARGS
value: "--wait-for-rpc"
- name: SPDK_XLIO_PATH
value: "/usr/lib/libxlio.so"
#- name: SPDK_RPC_INIT_CONF_JSON
# value: "/etc/nvda_snap/config.json"
- name: SPDK_RPC_INIT_CONF
value: "/etc/nvda_snap/spdk_rpc_init.conf"
- name: SNAP_RPC_INIT_CONF
value: "/etc/nvda_snap/snap_rpc_init.conf"
- name: XLIO_RX_BUFS
value: "8192"
- name: XLIO_TX_BUFS
value: "8192"
- name: SNAP_MEMPOOL_SIZE_MB
value: "8192"
volumes:
- name: hugepages
emptyDir:
medium: HugePages
- name: shm
hostPath:
path: /dev/shm
- name: infiniband
hostPath:
path: /dev/infiniband
- name: vfio
hostPath:
path: /dev/vfio
- name: conf
hostPath:
path: /etc/nvda_snap
- name: snap-log
hostPath:
path: /var/log/snap-log
SNAP-4 RPC
spdk_rpc.py sock_set_default_impl -i xlio
spdk_rpc.py framework_start_init
spdk_rpc.py bdev_nvme_set_options --transport-ack-timeout 12
spdk_rpc.py bdev_nvme_attach_controller -b Nvme0 -t nvda_tcp -a 105.22.0.101 -f ipv4 -s 4420 -n nqn.2025-03.io.spdk:cnode0
spdk_rpc.py bdev_nvme_attach_controller -b Nvme1 -t nvda_tcp -a 105.22.1.101 -f ipv4 -s 4420 -n nqn.2025-03.io.spdk:cnode1
snap_rpc.py nvme_subsystem_create --nqn nqn.2022-10.io.nvda.nvme:0
snap_rpc.py nvme_subsystem_create --nqn nqn.2022-10.io.nvda.nvme:1
snap_rpc.py nvme_namespace_create -b Nvme0n1 -n 1 --nqn nqn.2022-10.io.nvda.nvme:0 --uuid 3d9c3b54-5c31-410a-b4f0-7cf2afd9e111
snap_rpc.py nvme_namespace_create -b Nvme1n1 -n 2 --nqn nqn.2022-10.io.nvda.nvme:1 --uuid 3d9c3b54-5c31-410a-b4f0-7cf2afd9e112
snap_rpc.py nvme_controller_create --nqn nqn.2022-10.io.nvda.nvme:0 --ctrl NVMeCtrl1 --pf_id 0 --suspended -n 31
snap_rpc.py nvme_controller_create --nqn nqn.2022-10.io.nvda.nvme:1 --ctrl NVMeCtrl2 --pf_id 1 --suspended -n 31
snap_rpc.py nvme_controller_attach_ns -c NVMeCtrl1 -n 1
snap_rpc.py nvme_controller_attach_ns -c NVMeCtrl2 -n 2
snap_rpc.py nvme_controller_resume -c NVMeCtrl1
snap_rpc.py nvme_controller_resume -c NVMeCtrl2
XLIO INFO : ---------------------------------------------------------------------------
XLIO INFO : XLIO_VERSION: 3.50.3-1 Release built on Mar 31 2025 07:01:18
XLIO INFO : Git: 0d8f272ca2ac1440db92a477075578d9ec5bf8cb
XLIO INFO : Cmd Line: /opt/nvidia/nvda_snap/bin/snap_service --wait-for-rpc -r /var/tmp/spdk.sock
XLIO INFO : OFED Version: OFED-internal-25.04-0.6.1:
XLIO INFO : ---------------------------------------------------------------------------
XLIO INFO : Spec NVMEoTCP Profile for BF3 [XLIO_SPEC]
XLIO INFO : Log Level INFO [XLIO_TRACELEVEL]
XLIO INFO : Ring On Device Memory TX 1024 [XLIO_RING_DEV_MEM_TX]
XLIO INFO : Tx QP WRE 1024 [XLIO_TX_WRE]
XLIO INFO : Tx QP WRE Batching 128 [XLIO_TX_WRE_BATCHING]
XLIO INFO : Tx Bufs Batch TCP 1 [XLIO_TX_BUFS_BATCH_TCP]
XLIO INFO : Rx QP WRE 32 [XLIO_RX_WRE]
XLIO INFO : Rx Prefetch Bytes Before Poll 256 [XLIO_RX_PREFETCH_BYTES_BEFORE_POLL]
XLIO INFO : GRO max streams 0 [XLIO_GRO_STREAMS_MAX]
XLIO INFO : STRQ Strides per RWQE 8192 [XLIO_STRQ_NUM_STRIDES]
XLIO INFO : CQ Drain Thread Disabled [XLIO_PROGRESS_ENGINE_INTERVAL]
XLIO INFO : CQ Adaptive Moderation Disabled [XLIO_CQ_AIM_INTERVAL_MSEC]
XLIO INFO : CQ Keeps QP Full Disabled [XLIO_CQ_KEEP_QP_FULL]
XLIO INFO : QP Compensation Level 8 [XLIO_QP_COMPENSATION_LEVEL]
XLIO INFO : TCP nodelay 1 [XLIO_TCP_NODELAY]
XLIO INFO : Avoid sys-calls on tcp fd Enabled [XLIO_AVOID_SYS_CALLS_ON_TCP_FD]
XLIO INFO : Internal Thread Affinity 0x01 [XLIO_INTERNAL_THREAD_AFFINITY]
XLIO INFO : Memory limit 256 MB [XLIO_MEMORY_LIMIT]
XLIO INFO : Memory limit (user allocator) 2 GB [XLIO_MEMORY_LIMIT_USER]
XLIO INFO : SocketXtreme mode Enabled [XLIO_SOCKETXTREME]
XLIO INFO : TSO support Enabled [XLIO_TSO]
XLIO INFO : LRO support Enabled [XLIO_LRO]
XLIO INFO : fork() support Disabled [XLIO_FORK]
XLIO INFO : TCP abort on close Enabled [XLIO_TCP_ABORT_ON_CLOSE]
XLIO INFO : ---------------------------------------------------------------------------
nvmf_tgt
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock iobuf_set_options \
--small-pool-count 32767 --large-pool-count 16383
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock framework_start_init
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
nvmf_create_transport \
--trtype TCP \
--max-queue-depth 128 \
--max-io-qpairs-per-ctrlr 127 \
--in-capsule-data-size 8192 \
--io-unit-size 8192 \
--max-aq-depth 128 \
--num-shared-buffers 8192 \
--buf-cache-size 32 \
--sock-priority 0 \
--abort-timeout-sec 1
for i in {0..1};
do
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
bdev_null_create "Nullb"$i 65536 1024
done
for i in {0..1};
do
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
nvmf_create_subsystem "nqn.2025-03.io.spdk:cnode"$i -a \
-s "SPDK0000000000000"$i -d "SPDK_Controller"$i
done
for i in {0..1};
do
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
nvmf_subsystem_add_ns "nqn.2025-03.io.spdk:cnode"$i "Nullb"$i
done
for i in {0..1};
do
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
nvmf_subsystem_add_listener "nqn.2025-03.io.spdk:cnode"$i -t tcp -a 105.22.$i.101 -s 4420
done
- fio
$ cat fio_config.fio
[global]
ioengine=libaio
direct=1
group_reporting=1
random_generator=tausworthe64
time_based=1
runtime=100
direct=1
rw=randread
bs=1M
numjobs=32
iodepth=128
rwmixread=50
cpus_allowed_policy=split
[job0]
filename=/dev/nvme0n1
cpus_allowed=0-15,48-63
[job1]
filename=/dev/nvme1n1
cpus_allowed=16-47
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 447.1G 0 disk
├─sda1 8:1 0 1G 0 part
└─sda2 8:2 0 446.1G 0 part
sdb 8:16 0 931.5G 0 disk
├─sdb1 8:17 0 1G 0 part /boot/efi
└─sdb2 8:18 0 930.5G 0 part /
nvme0n1 259:0 0 64G 0 disk
nvme1n1 259:1 0 64G 0 disk