Clarification on NVMeOF Target Offload Behavior

Dear NVIDIA and Mellanox Teams,

I have been actively conducting experiments with NVMe over Fabrics (NVMeOF), and thus far, the functionality has been meeting expectations. To monitor the IO flow, I have been utilizing iostat -t 1 on both the target and host systems.

Recently, I followed a guide to enable target offload, which I’m pleased to report was successful. Upon observation, I noticed minimal target CPU overhead when the host is engaged in IO operations over NVMeOF. However, I encountered a curious observation: while the host system displayed IO activity as expected, the target side did not reflect any IO activity when monitored using iostat -t 1. I seek clarification regarding this behavior. Is it indeed the case that the target CPU is not involved in IO operations at all?

Furthermore, I would like to inquire about the possibility of NVMe-OF Target Offload with Multiple NVMe Devices. According to the guide and previous information, it seemed not possible. However, I have managed to set up multiple SSDs at the target side and allow the host to connect to those SSDs. For sure, the bandwidth is aggregated and within the capacity of the one port. Is there any update on this matter?

I would greatly appreciate any insights or explanations you can provide regarding these queries.

I do not have an explanation for iostat though, the CPUs on the target side should reflect as well minimal CPUs overhead.
I would check with top or htop.
You can also use our mlnx_perf -i (utility provided from our MLNX drivers) to check RX/TX activities.
The limitation for Nvme-OF target offload is non longer accurate (IE: Currently, an offloaded subsystem can be associated with only one namespace=no longer applicable).

1 Like

Hi Spruitt,

Thank you for your prompt response and confirmation. I would appreciate further clarification on the recent updates regarding target offload. Specifically, I’m interested in understanding the current mapping of NVMe devices, subsystems, namespaces, and offload ports. Currently, I’ve managed to map multiple NVMe SSDs with target offload enabled to my ConnectX-5, which has only one physical port. My approach involves assigning each NVMe device its own subsystem and namespace. I’ve use that all NVMe SSDs share the same IB connection IP since my ConnectX-5 has only one port. Then my understanding is those NVMe SSD “share” the target offload capacity (IOPS & Bandwidth) of the ConnectX-5. Is this understanding correct?

Regarding the iostat issue, it seems to indicate that the target kernel is being bypassed, resulting in no observed I/O traffic (although PCIe traffic is detected for sure). Additionally, I’ve noticed minimal CPU utilization at the target side, as observed through tools like htop or perf top.