I need to access a FPGA PCIe device from GPU kernel. The PCIe is mapped and I’m able to RD/WR from host code. I trying to map the fpga’s BAR0 address to gpu address so it can be accessed from the gpu kernel.
Before I call cudaHostRegister() I read valid values from FPGA. Once I call cudaHostRegister(), I read invalid values from the same address (still the host virtual address).
The results of accessing register at offset 0x58:
FPGA Timer test read: 3836528439 ← This is a good value
cudaHostRegister host virtual address: 0x7f87335b8000 as gpu address: 0x7f87335b8000
FPGA Timer test read: 0 ← This is not valid values
I also expect the gpu address generated by cudaHostGetDevicePointer() not be identical to the host virtual address for the FPGA.
You are doing mmap. cudaHostRegister also does mmap. I don’t think that will work. NVIDIA doesn’t have any instructions or methodology for accessing 3rd party PCIE devices that way.
I realized I had an optimization bug which mislead me to believe cudaHostRegister() created a mapping problem.
After fixing the optimization problem I’m able to read from the PCIe device IO space using the GPU Kernel.
My device has canUseHostPointerForRegisteredMem == 1, so it explains why host pointer and device pointer have the same values.
My goal is to minimize latency getting my Kernel data to the FPGA behind the PCIe device w/o waiting for synchronization end.
I need to know whether this the right technique to reduce CPU OS & CPU Driver involvement, going through a HW only path (memory controllers).
Can you confirm the memory access from the kernel to the PCIe IO mapped space uses a HW only path?