Can't access 3rdParty PCIe device after cudaHostRegister()


My setup has:
Ubuntu 20.04,
CUDA 11.6
GPU A100

I need to access a FPGA PCIe device from GPU kernel. The PCIe is mapped and I’m able to RD/WR from host code. I trying to map the fpga’s BAR0 address to gpu address so it can be accessed from the gpu kernel.
Before I call cudaHostRegister() I read valid values from FPGA. Once I call cudaHostRegister(), I read invalid values from the same address (still the host virtual address).

The mapping:

fd_fpga = open("/dev/mem", O_RDWR|O_SYNC);
fpga_reg = (unsigned long *)mmap(0, FPGA_BAR0_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd_fpga, FPGA_PHY_ADDESS_REG_START);

The CU code (still on host):

runTimerTest(unsigned long* fpga_reg)
	// setup execution parameters
	unsigned long h_answer;
	unsigned long* d_answer;

	unsigned long timerTestRead = *((unsigned long*)((unsigned char*)fpga_reg + 0x58));
	printf("FPGA Timer test read: %ld \n", timerTestRead);
	printf("cudaHostRegister host virtual address: %p ", (void*)fpga_reg);
	checkCudaErrors(cudaHostRegister(fpga_reg, 0x1000, cudaHostRegisterIoMemory));
	checkCudaErrors(cudaHostGetDevicePointer((void**)&d_fpga_reg, (void*)fpga_reg, 0));
	printf("as gpu address: %p\n", (void*)d_fpga_reg);

	h_answer = *(unsigned long*)((unsigned char*)(fpga_reg + 0x58));
	std::cout << "FPGA Timer test read: " << h_answer  << std::endl;

The results of accessing register at offset 0x58:
FPGA Timer test read: 3836528439 ← This is a good value
cudaHostRegister host virtual address: 0x7f87335b8000 as gpu address: 0x7f87335b8000
FPGA Timer test read: 0 ← This is not valid values

I also expect the gpu address generated by cudaHostGetDevicePointer() not be identical to the host virtual address for the FPGA.

Can you please assist?

You are doing mmap. cudaHostRegister also does mmap. I don’t think that will work. NVIDIA doesn’t have any instructions or methodology for accessing 3rd party PCIE devices that way.

The typical approach for access to 3rd party PCIE devices is to write a GPUDirect RDMA driver for the 3rd party PCIE device. I won’t be able to give you a recipe.

Hi @Robert_Croverlla, thanks for your fast response.

  1. My use case is similar to
    The fact that cudaHostRegister() supports cudaHostRegisterIoMemory flag indicates kernels should be able to write to PCIe IO mapped, right?

  2. I realized I had an optimization bug which mislead me to believe cudaHostRegister() created a mapping problem.
    After fixing the optimization problem I’m able to read from the PCIe device IO space using the GPU Kernel.
    My device has canUseHostPointerForRegisteredMem == 1, so it explains why host pointer and device pointer have the same values.

  3. My goal is to minimize latency getting my Kernel data to the FPGA behind the PCIe device w/o waiting for synchronization end.
    I need to know whether this the right technique to reduce CPU OS & CPU Driver involvement, going through a HW only path (memory controllers).
    Can you confirm the memory access from the kernel to the PCIe IO mapped space uses a HW only path?

‫בתאריך יום ד׳, 26 באוק׳ 2022 ב-17:31 מאת ‪Robert_Crovella via NVIDIA Developer Forums‬‏ <‪‬‏>:‬

No, sorry, I can’t respond to or confirm any of that. Perhaps others here will have some input. Sounds like you got it working.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.