I get a segmentation fault in doca_mmap_create_from_export() that depends on the ordering of DOCA API calls. This occurs when local memory maps (doca_mmap) are created and started on a device before doca_rdma_create() is called for the same device
DOCA SDK Version: 3.2.0
Platform: Linux amd64, ConnectX-6 DX with GPU RTX5000 Ada
Working order (matches DOCA samples):
-
doca_gpu_create() -
doca_rdma_create()+doca_ctx_start() -
doca_gpu_mem_alloc()+doca_mmap_create()/doca_mmap_set_dmabuf_memrange()/doca_mmap_add_dev()/doca_mmap_start()(local mmaps) -
doca_mmap_export_rdma()(export local descriptor) -
doca_mmap_create_from_export()(import remote descriptor) — works
Failing order:
-
doca_gpu_create() -
doca_gpu_mem_alloc()+doca_mmap_create()/doca_mmap_set_dmabuf_memrange()/doca_mmap_add_dev()/doca_mmap_start()(local mmaps) -
doca_mmap_export_rdma()(export local descriptor) -
doca_rdma_create()+doca_ctx_start() -
doca_mmap_create_from_export()(import remote descriptor) — segfault
Details:
-
All
doca_mmapanddoca_rdmaoperations use the samedoca_dev -
The GPU memory is mapped using dmabuf (
doca_mmap_set_dmabuf_memrange) -
The exported descriptor is valid (124 bytes, transferred via TCP out-of-band channel)
-
The
doca_devpointer is valid and non-null at the time of the crash -
Creating 1 local mmap before
doca_rdma_createsometimes works; creating 2 local mmaps (for different memory ranges) reliably triggers the segfault -
All
doca_mmap_*calls returnDOCA_SUCCESS— no errors reported before the crash -
The remote side (server) successfully performs
doca_mmap_create_from_exportwith the same code pattern
Reproduction:
Both endpoints share the same GPU (doca_gpu_create with same PCIe BDF) but use different IB devices (e.g., mlx5_0 for server, mlx5_2 for client). RDMA connection (RC transport) is fully established before the failing doca_mmap_create_from_export call
Question:
Is there a requirement that doca_rdma_create() must be called before doca_mmap_create() / doca_mmap_start()? The DOCA programming guide and API reference don’t mention this ordering constraint. If this is a known requirement, it would be helpful to have it documented