I’m trying to setup IPSec between two nodes, and communicate via RoCEv2 RDMA.
I don’t need OVS or StrongSwan for VLAN setup / key exchange, for this setup I can manually set up keys and I just want to verify that IPSec works for now. The ulimate goal I want is near line rate performance for RDMA traffic encrypted with IPSec.
In short, I want to setup IPSec and do a perftest between two nodes and actually see the RoCE packets be encapsulated in ESP packets on the switch.
The issue that I’m having is that:
- performance is very bad if I setup offloading
- I’m seeing plain RoCE packets even after following the directions for IPSec setup.
My setup:
- two bare metal servers connected via L2 Switch, each with BF-3. (Node A and Node B)
- another node with with a NIC that is port mirrored via the switch to verify the packet going through the NIC to one of the server with BF-3. (Node C)
- using only one port, no VF, just PF (is usage of VF mandatory for IPSec?)
My Experimentation:
- SW only attempt:
- Performed SW only IPSec, saw ESP packets with iperf, and very reduced performance compared with line-rate (Expected)
- For sw only ipsec, i setup xfrm state and policies on the host only, with BF-3 in DPU mode. And I was able to see ESP packets on packet capture from Node C
- HW offload attempt:
- Attempted to perform IPSec offloaded to hw in BF-3, setup xfrm state and policy directly on the DPU, in the ubuntu image of BF-3.
- For iperf saw much better performance compared to SW only IPSec, but still very, very short of line rate (somewhere in the 10~20% range of line rate). For perftest, saw very bad performance (less than 10% of line rate)
- Did not see any ESP packets on packet capture of Node C for neither iperf nor perftest
My Method:
- setup xfrm state and policies with
ip xfrm state/policy ...
for both nodes (in the host for sw only, in the dpu for hw offloaded)
Question:
- what is the expected performance penalty of using IPSec for RDMA traffic?
- what things am I missing / doing wrong in my setup?
- Why is performance bad when I’m not even seeing encrypted packets? where is the performance drop coming from? If I set it up incorrectly, then I expect to either see encrypted packets with bad performance, or plaintext packets but with good performance, not, plaintext packets and bad performance.
Additional comments:
- it was unclear from the docs which commands should be executed on the host and which in the DPU.
- the docs often mention usage of strongswan or ovs, but I am under the impression that bare bones setup between two nodes only require just xfrm states and policies, please correct me if I’m wrong.
- There are many typos in the IPSec offload for RDMA traffic documentation, which adds to the confusion.
I’m using these links as reference:
2 Likes
Hi leedongjoo96,
Welcome, and thank you for posting your inquiry to the NVIDIA Developer Forums.
If you require more in-depth assistance after reading the below, and you have a valid NVIDIA Enterprise Support Entitlement, we highly recommend opening a support ticket with NVIDIA Enterprise Experience for further triage and assistance.
- Configuration location: Commands must be executed on both host and DPU sides correctly.
- For hardware offload, xfrm state/policy configuration should be on the DPU
- Traffic steering must be properly configured on the host
- VF configuration: While not strictly mandatory, VFs are recommended for proper isolation and performance:
- Create and configure VFs for the BlueField-3 device
- Bind VFs to the RDMA applications
- ESP encapsulation issue: The lack of ESP packets indicates failed offload. Verify:
- HW crypto capabilities match algorithm selection
- Proper Traffic Class marking for RDMA packets
- Compatible IPsec parameters are used (supported cipher/auth combinations)
- Performance optimization:
- Use larger MTU (jumbo frames) if network supports it
- Configure proper memory registration for RDMA with IPsec
- Verify QoS settings aren’t limiting throughput
- Required xfrm flags:
- Add ‘offload dev <interface_name>’ to xfrm state commands
- Add ‘reqid <matching_id>’ to link policies with states
Implementation steps:
- Reset all configurations
- Configure IPsec with proper offload flags on DPU side
- Verify hardware capabilities are detected
- Follow NVIDIA’s optimized configuration patterns for BlueField-3 IPsec+RDMA
An example simple configuration follows:
Basic xfrm configuration example (Without StrongSwan/LibreSwan):
- Host Side Configuration:
# Load required kernel modules
modprobe -a xfrm_user
# Verify hardware crypto capabilities
cat /sys/class/net/p0/device/mlx5_cap/crypto
- DPU Side Configuration (for hardware offload):
# Configure xfrm state for outbound traffic
ip xfrm state add src 192.168.1.1 dst 192.168.1.2 proto esp spi 0x12345678 \
reqid 1 mode transport \
enc "aes-gcm" 0x010203040506070809101112131415161718192021 \
offload dev enp1s0f0 dir out
# Configure matching state for inbound
ip xfrm state add src 192.168.1.2 dst 192.168.1.1 proto esp spi 0x87654321 \
reqid 1 mode transport \
enc "aes-gcm" 0x010203040506070809101112131415161718192021 \
offload dev enp1s0f0 dir in
# Configure policies
ip xfrm policy add src 192.168.1.1 dst 192.168.1.2 dir out tmpl proto esp mode transport reqid 1
ip xfrm policy add src 192.168.1.2 dst 192.168.1.1 dir in tmpl proto esp mode transport reqid 1
- Repeat on second node with reversed IP addresses.
Troubleshooting Steps
- Verify offload capability:
ip xfrm state list
Look for “offload” flag set to “yes”
- Check for errors:
dmesg | grep -i xfrm
dmesg | grep -i mlx
- Verify RDMA traffic marking:
Configure outgoing RDMA traffic with proper TOS/TC marking to match IPsec policies
- Monitor packets:
tcpdump -i enp1s0f0 esp -vv
Performance Optimization
- MTU settings:
ip link set dev enp1s0f0 mtu 9000
- IRQ affinity:
mlnx_tune -p HIGH_THROUGHPUT
- Memory parameters:
echo 16384 > /sys/class/infiniband/mlx5_0/device/ib_dev/iser_max_sectors
Testing Configuration
- Test basic connectivity:
ping <remote_ip>
- Verify encryption:
- Performance testing:
ib_write_bw -d mlx5_0 -i 1 -F --report_gbits <remote_ip>
This basic configuration should enable IPsec-protected RDMA traffic using hardware offload without requiring StrongSwan/Libreswan.
Hope this helps you proceed on this issue.
Best regards,
NVIDIA Enterprise Experience