SNAP-4 xlio packet limitation

I saw this error at SNAP-4 container in BF3 during NVMe/TCP test. It might be related to the fio block size more that 1M. Is there any related configuration or environment to extend xlio packet and prevent connection failure?

[2025-07-04 02:21:59.475959] nvme_nvda_tcp.c: 751:xlio_sock_get_packet: *WARNING*: Not enough xlio packets, using dynamic allocation. Performance may be degraded
[2025-07-04 02:22:12.436704] bdev_nvme.c:5257:timeout_cb: *WARNING*: [nqn.2025-03.io.spdk:cnode1, 1] Warning: Detected a timeout. ctrlr=0xaaab0100a010 qpair=0x200004a09bc0 cid=12
[2025-07-04 02:22:12.437328] bdev_nvme.c:5257:timeout_cb: *WARNING*: [nqn.2025-03.io.spdk:cnode1, 1] Warning: Detected a timeout. ctrlr=0xaaab0100a010 qpair=0x200004a09bc0 cid=13

Here are configurations that I set to run SNAP-4 with xlio

  1. mlxconfig
$ sudo mlxconfig -d /dev/mst/mt41692_pciconf0 -e q | grep -i \*
*       NVME_EMULATION_ENABLE                       False(0)             True(1)              True(1)
*       NVME_EMULATION_NUM_PF                       1                    2                    2
*       NVME_EMULATION_NUM_MSIX                     0                    64                   0
*       VIRTIO_BLK_EMULATION_NUM_MSIX               2                    0                    2
*       VIRTIO_FS_EMULATION_NUM_MSIX                2                    0                    2
*       VIRTIO_NET_EMULATION_NUM_MSIX               2                    0                    2
*       PER_PF_NUM_SF                               False(0)             True(1)              True(1)
*       NVME_EMU_MNG_ENABLE                         False(0)             True(1)              False(0)
*       NVME_EMU_MNG_NUM_PF                         1                    2                    1
*       PF_TOTAL_SF                                 0                    32                   2
*       PF_SF_BAR_SIZE                              0                    8                    8
The '*' shows parameters with next value different from default/current value.
  1. xlio.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: snap
spec:
  hostNetwork: true
  containers:
  - name: snap
    image: nvcr.io/nvidia/doca/doca_snap:4.7.0-doca3.0.0
    imagePullPolicy: IfNotPresent
    securityContext:
      privileged: true
      capabilities:
        add: ["IPC_LOCK", "SYS_RAWIO", "SYS_NICE"]
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepages
    - mountPath: /dev/shm
      name: shm
    - mountPath: /dev/infiniband
      name: infiniband
    - mountPath: /dev/vfio
      name: vfio
    - mountPath: /etc/nvda_snap
      name: conf
    - mountPath: /var/log/snap-log
      name: snap-log
    resources:
      requests:
        memory: "4Gi"
        cpu: "8"
      limits:
        hugepages-2Mi: "12Gi"
        memory: "16Gi"
        cpu: "16"
    env:
      ## To enable XLIO un-comment SPDK_XLIO_PATH
      ## App-Specific command line arguments
      - name: APP_ARGS
        value: "--wait-for-rpc"
      - name: SPDK_XLIO_PATH
        value: "/usr/lib/libxlio.so"
      #- name: SPDK_RPC_INIT_CONF_JSON
      #  value: "/etc/nvda_snap/config.json"
      - name: SPDK_RPC_INIT_CONF
        value: "/etc/nvda_snap/spdk_rpc_init.conf"
      - name: SNAP_RPC_INIT_CONF
        value: "/etc/nvda_snap/snap_rpc_init.conf"
      - name: XLIO_RX_BUFS
        value: "8192"
      - name: XLIO_TX_BUFS
        value: "8192"
      - name: SNAP_MEMPOOL_SIZE_MB
        value: "8192"
  volumes:
  - name: hugepages
    emptyDir:
      medium: HugePages
  - name: shm
    hostPath:
      path: /dev/shm
  - name: infiniband
    hostPath:
      path: /dev/infiniband
  - name: vfio
    hostPath:
      path: /dev/vfio
  - name: conf
    hostPath:
      path: /etc/nvda_snap
  - name: snap-log
    hostPath:
      path: /var/log/snap-log
  1. SNAP-4 RPC
spdk_rpc.py sock_set_default_impl -i xlio
spdk_rpc.py framework_start_init
spdk_rpc.py bdev_nvme_set_options --transport-ack-timeout 12

spdk_rpc.py bdev_nvme_attach_controller -b Nvme0 -t nvda_tcp -a 105.22.0.101 -f ipv4 -s 4420 -n nqn.2025-03.io.spdk:cnode0
spdk_rpc.py bdev_nvme_attach_controller -b Nvme1 -t nvda_tcp -a 105.22.1.101 -f ipv4 -s 4420 -n nqn.2025-03.io.spdk:cnode1

snap_rpc.py nvme_subsystem_create --nqn nqn.2022-10.io.nvda.nvme:0
snap_rpc.py nvme_subsystem_create --nqn nqn.2022-10.io.nvda.nvme:1

snap_rpc.py nvme_namespace_create -b Nvme0n1 -n 1 --nqn nqn.2022-10.io.nvda.nvme:0 --uuid 3d9c3b54-5c31-410a-b4f0-7cf2afd9e111
snap_rpc.py nvme_namespace_create -b Nvme1n1 -n 2 --nqn nqn.2022-10.io.nvda.nvme:1 --uuid 3d9c3b54-5c31-410a-b4f0-7cf2afd9e112

snap_rpc.py nvme_controller_create --nqn nqn.2022-10.io.nvda.nvme:0 --ctrl NVMeCtrl1 --pf_id 0 --suspended -n 31
snap_rpc.py nvme_controller_create --nqn nqn.2022-10.io.nvda.nvme:1 --ctrl NVMeCtrl2 --pf_id 1 --suspended -n 31

snap_rpc.py nvme_controller_attach_ns -c NVMeCtrl1 -n 1
snap_rpc.py nvme_controller_attach_ns -c NVMeCtrl2 -n 2

snap_rpc.py nvme_controller_resume -c NVMeCtrl1
snap_rpc.py nvme_controller_resume -c NVMeCtrl2
 XLIO INFO   : ---------------------------------------------------------------------------
 XLIO INFO   : XLIO_VERSION: 3.50.3-1 Release built on Mar 31 2025 07:01:18
 XLIO INFO   : Git: 0d8f272ca2ac1440db92a477075578d9ec5bf8cb
 XLIO INFO   : Cmd Line: /opt/nvidia/nvda_snap/bin/snap_service --wait-for-rpc -r /var/tmp/spdk.sock
 XLIO INFO   : OFED Version: OFED-internal-25.04-0.6.1:
 XLIO INFO   : ---------------------------------------------------------------------------
 XLIO INFO   : Spec                           NVMEoTCP Profile for BF3   [XLIO_SPEC]
 XLIO INFO   : Log Level                      INFO                       [XLIO_TRACELEVEL]
 XLIO INFO   : Ring On Device Memory TX       1024                       [XLIO_RING_DEV_MEM_TX]
 XLIO INFO   : Tx QP WRE                      1024                       [XLIO_TX_WRE]
 XLIO INFO   : Tx QP WRE Batching             128                        [XLIO_TX_WRE_BATCHING]
 XLIO INFO   : Tx Bufs Batch TCP              1                          [XLIO_TX_BUFS_BATCH_TCP]
 XLIO INFO   : Rx QP WRE                      32                         [XLIO_RX_WRE]
 XLIO INFO   : Rx Prefetch Bytes Before Poll  256                        [XLIO_RX_PREFETCH_BYTES_BEFORE_POLL]
 XLIO INFO   : GRO max streams                0                          [XLIO_GRO_STREAMS_MAX]
 XLIO INFO   : STRQ Strides per RWQE          8192                       [XLIO_STRQ_NUM_STRIDES]
 XLIO INFO   : CQ Drain Thread                Disabled                   [XLIO_PROGRESS_ENGINE_INTERVAL]
 XLIO INFO   : CQ Adaptive Moderation         Disabled                   [XLIO_CQ_AIM_INTERVAL_MSEC]
 XLIO INFO   : CQ Keeps QP Full               Disabled                   [XLIO_CQ_KEEP_QP_FULL]
 XLIO INFO   : QP Compensation Level          8                          [XLIO_QP_COMPENSATION_LEVEL]
 XLIO INFO   : TCP nodelay                    1                          [XLIO_TCP_NODELAY]
 XLIO INFO   : Avoid sys-calls on tcp fd      Enabled                    [XLIO_AVOID_SYS_CALLS_ON_TCP_FD]
 XLIO INFO   : Internal Thread Affinity       0x01                       [XLIO_INTERNAL_THREAD_AFFINITY]
 XLIO INFO   : Memory limit                   256 MB                     [XLIO_MEMORY_LIMIT]
 XLIO INFO   : Memory limit (user allocator)  2 GB                       [XLIO_MEMORY_LIMIT_USER]
 XLIO INFO   : SocketXtreme mode              Enabled                    [XLIO_SOCKETXTREME]
 XLIO INFO   : TSO support                    Enabled                    [XLIO_TSO]
 XLIO INFO   : LRO support                    Enabled                    [XLIO_LRO]
 XLIO INFO   : fork() support                 Disabled                   [XLIO_FORK]
 XLIO INFO   : TCP abort on close             Enabled                    [XLIO_TCP_ABORT_ON_CLOSE]
 XLIO INFO   : ---------------------------------------------------------------------------
  1. nvmf_tgt
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock iobuf_set_options \
  --small-pool-count 32767 --large-pool-count 16383
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock framework_start_init
sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
  nvmf_create_transport \
  --trtype TCP \
  --max-queue-depth 128 \
  --max-io-qpairs-per-ctrlr 127 \
  --in-capsule-data-size 8192 \
  --io-unit-size 8192 \
  --max-aq-depth 128 \
  --num-shared-buffers 8192 \
  --buf-cache-size 32 \
  --sock-priority 0 \
  --abort-timeout-sec 1
for i in {0..1};
do
  sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
    bdev_null_create "Nullb"$i 65536 1024
done
for i in {0..1};
do
  sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
    nvmf_create_subsystem "nqn.2025-03.io.spdk:cnode"$i -a \
    -s "SPDK0000000000000"$i -d "SPDK_Controller"$i
done
for i in {0..1};
do
  sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
    nvmf_subsystem_add_ns "nqn.2025-03.io.spdk:cnode"$i "Nullb"$i
done
for i in {0..1};
do
  sudo scripts/rpc.py -s /var/tmp/spdk-mango-00.sock \
    nvmf_subsystem_add_listener "nqn.2025-03.io.spdk:cnode"$i -t tcp -a 105.22.$i.101 -s 4420
done
  1. fio
$ cat fio_config.fio
[global]
ioengine=libaio
direct=1

group_reporting=1
random_generator=tausworthe64
time_based=1
runtime=100
direct=1
rw=randread
bs=1M
numjobs=32
iodepth=128
rwmixread=50
cpus_allowed_policy=split

[job0]
filename=/dev/nvme0n1
cpus_allowed=0-15,48-63

[job1]
filename=/dev/nvme1n1
cpus_allowed=16-47

$ lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda       8:0    0 447.1G  0 disk
├─sda1    8:1    0     1G  0 part
└─sda2    8:2    0 446.1G  0 part
sdb       8:16   0 931.5G  0 disk
├─sdb1    8:17   0     1G  0 part /boot/efi
└─sdb2    8:18   0 930.5G  0 part /
nvme0n1 259:0    0    64G  0 disk
nvme1n1 259:1    0    64G  0 disk