DOCA GPUNetIO CUDA kernel doesn't run in parallel with NVIDIA driver 545

I am trying to run a UDP packet processing program using DOCA 2.2 and DOCA 2.5 with ConnectX7, but I found the kernels don’t run in parallel under some environment. Here is the list of which environment works or doesn’t work.

  • DOCA 2.2 and CUDA 530: Works
  • DOCA 2.2/2.5 and CUDA 545: Doesn’t work

Here is the structure I implemented. There are two kernels. One is receiving UDP packets, one is constructing data from the received payload. Those two are communicating through semaphores provided by DOCA and launched in the different CUDA streams.

|receive packet|—semaphore—>|construct data from received payload|

I expect those two kernels to run in parallel, but it doesn’t. Second one seems not be launched.
When I merge the kernels into one kernel and run each function in a different block, then it works.
Also, I confirmed the program correctly works with DOCA 2.2 and CUDA driver version 530.30.02, but doesn’t work with DOCA 2.2/2.5 and CUDA driver version 545.23.08.

The environment that doesn’t work is

ubuntu 22.04 with Linux kernel 6.5.0-15-generic, DOCA 2.2/2.5, CUDA driver version 545.23.08

$ nvidia-smi # I tried both of legacy driver and open driver but there is no improvement.
Thu Feb  8 19:49:52 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  | 00000000:81:00.0 Off |                    0 |
| N/A   24C    P0              34W / 250W |      4MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

$ dpkg -l | grep doca
ii  doca-apps                             2.5.0108-1                               amd64        DOCA-Based reference applications
ii  doca-apps-dev                         2.5.0108-1                               amd64        Development files for DOCA Apps
ii  doca-gpu                              2.5.0108-1                               amd64        GPU runtime capabilities for DOCA
ii  doca-gpu-dev                          2.5.0108-1                               amd64        Development files for DOCA GPU
ii  doca-grpc                             2.5.0108-1                               amd64        gRPC runtime capabilities for DOCA
ii  doca-grpc-dev                         2.5.0108-1                               amd64        Development files for DOCA grpc
ii  doca-host-repo-ubuntu2204             2.5.0-0.0.1.2.5.0108.1.23.10.1.1.9.0     amd64        Doca repo bundle package
ii  doca-libs                             2.5.0108-1                               amd64        Data Center on a Chip Architecture (DOCA)
ii  doca-prime-runtime                    2.5.0108-1                               amd64        DOCA prime runtime metapackage
ii  doca-prime-sdk                        2.5.0108-1                               amd64        DOCA prime sdk metapackage
ii  doca-prime-tools                      2.5.0108-1                               amd64        Runtime DOCA Tools
ii  doca-runtime                          2.5.0-0.0.1                              amd64        doca-runtime meta-package
ii  doca-samples                          2.5.0108-1                               amd64        DOCA Samples
ii  doca-sdk                              2.5.0-0.0.1                              amd64        doca-sdk meta-package
ii  doca-services                         2.5.0108-1                               amd64        Runtime utilities associated with DOCA Services
ii  doca-tools                            2.5.0-0.0.1                              amd64        doca-tools meta-package
ii  libdoca-libs-dev                      2.5.0108-1                               amd64        Development files for DOCA Libs

ConnectX7 firmware
$ sudo mlxfwmanager
  Device Type:      ConnectX7
  Part Number:      MCX755106AS-HEA_Ax
  Description:      NVIDIA ConnectX-7 HHHL Adapter Card; 200GbE (default mode) / NDR200 IB; Dual-port QSFP112; PCIe 5.0 x16 with x16 PCIe extension option; Crypto Disabled; Secure Boot Enabled
  PSID:             MT_0000000834
  PCI Device Name:  0000:a1:00.0
  Base MAC:         a088c234ad46
  Versions:         Current        Available
     FW             28.38.1900     N/A
     PXE            3.7.0201       N/A
     UEFI           14.31.0020     N/A

And the correctly working environment is

ubuntu 22.04 with linux kernel 5.15.0-92-generic, doca 2.2, nvidia driver version 530.30.02

$ nvidia-smi
Thu Feb  8 20:32:04 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-PCIE-40GB           On | 00000000:81:00.0 Off |                    0 |
| N/A   23C    P0               33W / 250W|      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

$ dpkg -l | grep doca
ii  doca-apps                             2.2.0080-1                               amd64        DOCA-Based reference applications
ii  doca-apps-dev                         2.2.0080-1                               amd64        Development files for DOCA Apps
ii  doca-gpu                              2.2.0080-1                               amd64        GPU runtime capabilities for DOCA
ii  doca-gpu-dev                          2.2.0080-1                               amd64        Development files for DOCA GPU
ii  doca-grpc                             2.2.0080-1                               amd64        gRPC runtime capabilities for DOCA
ii  doca-grpc-dev                         2.2.0080-1                               amd64        Development files for DOCA grpc
ii  doca-host-repo-ubuntu2204             2.2.0-0.0.3.2.2.0080.1.23.07.0.5.0.0     amd64        Doca repo bundle package
ii  doca-libs                             2.2.0080-1                               amd64        Data Center on a Chip Architecture (DOCA)
ii  doca-ofed                             2.2.0-0.0.3                              amd64        doca-ofed meta-package
ii  doca-prime-runtime                    2.2.0080-1                               amd64        DOCA prime runtime metapackage
ii  doca-prime-sdk                        2.2.0080-1                               amd64        DOCA prime sdk metapackage
ii  doca-prime-tools                      2.2.0080-1                               amd64        Runtime DOCA Tools
ii  doca-remote-memory-app                22.07.0                                  amd64        Nvidia Bluefield regex benchmarking tool companion remote memory app.
ii  doca-runtime                          2.2.0-0.0.3                              amd64        doca-runtime meta-package
ii  doca-samples                          2.2.0080-1                               amd64        DOCA Samples
ii  doca-sdk                              2.2.0-0.0.3                              amd64        doca-sdk meta-package
ii  doca-services                         2.2.0080-1                               amd64        Runtime utilities associated with DOCA Services
ii  doca-tools                            2.2.0-0.0.3                              amd64        doca-tools meta-package
ii  libdoca-libs-dev                      2.2.0080-1                               amd64        Development files for DOCA Libs

ConnectX7 is the same device above.

Here is the sample program you may reproduce.
sample.zip (11.4 KB).

You can switch separating kernel by #define SEPERATE_KERNEL in doca_udp.cu, and doca 2.2 and 2.5 by #define DOCA22 in doca_udp.hpp

Environment setting is

echo 2048 | sudo tee /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
echo 2048 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 8 | sudo tee /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
echo 8 | sudo tee /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
sudo modprobe nvidia-peermem
sudo ip link set dev enp161s0f0np0 mtu 8000 # device you use
sudo ip addr add 192.168.222.2/24 dev enp161s0f0np0 # device you use

compiling and run

mkdir build
cmake -DCMAKE_BUILD_TYPE=Release .. && make -j
sudo ./test_doca_udp

Data sending side:

nping --udp -p <target_port> <target_ip> --data-length <size_in_bytes>

Which in my case is:

nping --udp -p 1234 192.168.222.2 --data-length 1550

Resolved by ourselves.

Before starting kernels, we started and ended each kernels in default stream, then worked in parallel.

Did anybody encounter the same behavior?

Thanks for sharing the solution.

Cheers,
Tom