IPSec RDMA using Bluefield-3

I’m trying to setup IPSec between two nodes, and communicate via RoCEv2 RDMA.
I don’t need OVS or StrongSwan for VLAN setup / key exchange, for this setup I can manually set up keys and I just want to verify that IPSec works for now. The ulimate goal I want is near line rate performance for RDMA traffic encrypted with IPSec.

In short, I want to setup IPSec and do a perftest between two nodes and actually see the RoCE packets be encapsulated in ESP packets on the switch.

The issue that I’m having is that:

  • performance is very bad if I setup offloading
  • I’m seeing plain RoCE packets even after following the directions for IPSec setup.

My setup:

  • two bare metal servers connected via L2 Switch, each with BF-3. (Node A and Node B)
  • another node with with a NIC that is port mirrored via the switch to verify the packet going through the NIC to one of the server with BF-3. (Node C)
  • using only one port, no VF, just PF (is usage of VF mandatory for IPSec?)

My Experimentation:

  1. SW only attempt:
  • Performed SW only IPSec, saw ESP packets with iperf, and very reduced performance compared with line-rate (Expected)
  • For sw only ipsec, i setup xfrm state and policies on the host only, with BF-3 in DPU mode. And I was able to see ESP packets on packet capture from Node C
  1. HW offload attempt:
  • Attempted to perform IPSec offloaded to hw in BF-3, setup xfrm state and policy directly on the DPU, in the ubuntu image of BF-3.
  • For iperf saw much better performance compared to SW only IPSec, but still very, very short of line rate (somewhere in the 10~20% range of line rate). For perftest, saw very bad performance (less than 10% of line rate)
  • Did not see any ESP packets on packet capture of Node C for neither iperf nor perftest

My Method:

  • setup xfrm state and policies with ip xfrm state/policy ... for both nodes (in the host for sw only, in the dpu for hw offloaded)

Question:

  • what is the expected performance penalty of using IPSec for RDMA traffic?
  • what things am I missing / doing wrong in my setup?
  • Why is performance bad when I’m not even seeing encrypted packets? where is the performance drop coming from? If I set it up incorrectly, then I expect to either see encrypted packets with bad performance, or plaintext packets but with good performance, not, plaintext packets and bad performance.

Additional comments:

  • it was unclear from the docs which commands should be executed on the host and which in the DPU.
  • the docs often mention usage of strongswan or ovs, but I am under the impression that bare bones setup between two nodes only require just xfrm states and policies, please correct me if I’m wrong.
  • There are many typos in the IPSec offload for RDMA traffic documentation, which adds to the confusion.

I’m using these links as reference:

2 Likes

Hi leedongjoo96,

Welcome, and thank you for posting your inquiry to the NVIDIA Developer Forums.

If you require more in-depth assistance after reading the below, and you have a valid NVIDIA Enterprise Support Entitlement, we highly recommend opening a support ticket with NVIDIA Enterprise Experience for further triage and assistance.

  1. Configuration location: Commands must be executed on both host and DPU sides correctly.
  • For hardware offload, xfrm state/policy configuration should be on the DPU
  • Traffic steering must be properly configured on the host
  1. VF configuration: While not strictly mandatory, VFs are recommended for proper isolation and performance:
  • Create and configure VFs for the BlueField-3 device
  • Bind VFs to the RDMA applications
  1. ESP encapsulation issue: The lack of ESP packets indicates failed offload. Verify:
  • HW crypto capabilities match algorithm selection
  • Proper Traffic Class marking for RDMA packets
  • Compatible IPsec parameters are used (supported cipher/auth combinations)
  1. Performance optimization:
  • Use larger MTU (jumbo frames) if network supports it
  • Configure proper memory registration for RDMA with IPsec
  • Verify QoS settings aren’t limiting throughput
  1. Required xfrm flags:
  • Add ‘offload dev <interface_name>’ to xfrm state commands
  • Add ‘reqid <matching_id>’ to link policies with states

Implementation steps:

  1. Reset all configurations
  2. Configure IPsec with proper offload flags on DPU side
  3. Verify hardware capabilities are detected
  4. Follow NVIDIA’s optimized configuration patterns for BlueField-3 IPsec+RDMA

An example simple configuration follows:

Basic xfrm configuration example (Without StrongSwan/LibreSwan):

  1. Host Side Configuration:
   # Load required kernel modules
   modprobe -a xfrm_user

   # Verify hardware crypto capabilities
   cat /sys/class/net/p0/device/mlx5_cap/crypto
  1. DPU Side Configuration (for hardware offload):
# Configure xfrm state for outbound traffic

ip xfrm state add src 192.168.1.1 dst 192.168.1.2 proto esp spi 0x12345678 \
reqid 1 mode transport \
enc "aes-gcm" 0x010203040506070809101112131415161718192021 \
offload dev enp1s0f0 dir out

# Configure matching state for inbound
ip xfrm state add src 192.168.1.2 dst 192.168.1.1 proto esp spi 0x87654321 \
reqid 1 mode transport \
enc "aes-gcm" 0x010203040506070809101112131415161718192021 \
offload dev enp1s0f0 dir in

# Configure policies
ip xfrm policy add src 192.168.1.1 dst 192.168.1.2 dir out tmpl proto esp mode transport reqid 1
ip xfrm policy add src 192.168.1.2 dst 192.168.1.1 dir in tmpl proto esp mode transport reqid 1

  1. Repeat on second node with reversed IP addresses.

Troubleshooting Steps

  1. Verify offload capability:
ip xfrm state list

Look for “offload” flag set to “yes”

  1. Check for errors:
dmesg | grep -i xfrm
dmesg | grep -i mlx
  1. Verify RDMA traffic marking:

Configure outgoing RDMA traffic with proper TOS/TC marking to match IPsec policies

  1. Monitor packets:
tcpdump -i enp1s0f0 esp -vv

Performance Optimization

  1. MTU settings:
ip link set dev enp1s0f0 mtu 9000
  1. IRQ affinity:
mlnx_tune -p HIGH_THROUGHPUT
  1. Memory parameters:
echo 16384 > /sys/class/infiniband/mlx5_0/device/ib_dev/iser_max_sectors

Testing Configuration

  1. Test basic connectivity:
ping <remote_ip>
  1. Verify encryption:
  • Run tcpdump on monitoring node

  • Confirm ESP packets are seen on the wire

  1. Performance testing:
ib_write_bw -d mlx5_0 -i 1 -F --report_gbits <remote_ip>

This basic configuration should enable IPsec-protected RDMA traffic using hardware offload without requiring StrongSwan/Libreswan.

Hope this helps you proceed on this issue.

Best regards,
NVIDIA Enterprise Experience