Failed to create QP when add a workQ to a ctx

Hey guys. I try to build a benchmark on DOCA SDK and implement it mostly based on the provided samples dma_copy_host and dma_copy_dpu. However, I found a problem that when I try to use more than 4 threads in benchmark, the function doca_ctx_workq_add will throw an error showing DMA work queue context unable to create QP. err=DOCA_ERROR_NO_MEMORY. However, when I use configuration like 4 clients and 2 threads each, the benchmark will not throw the error. It’s quite strange, so I’m writing for a help.
You can find my benchmark on github and I have checked that the DPU can create more than 4 QPs under a normal RDMA benchmark so I guess that the DPU works well.

Also, here’s the output of ulimit -a on my DPU.

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63594
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63594
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited```