Get error data from PCIe card by cudaMemcpy()

I use altera FPGA implement a PCIe card to exchange data with GTX1060.

I use the pinned memory with the address space of PCIe card as cudaHostRegister(pci_data,…), they should exchange
data by DMA, when GTX1060 as master and PCIe card as slave.

When data from GPU to PCIe card, as cudaMemcpy(pci_data, dev_data,…), the PCIe card gets the correct data, and the GPU memory rate in Nsight is 2GB/s.
When data from PCIe card to GPU, as cudaMemcpy(dev_data, pci_data,…), GPU gets the error data, all the data is 0xffff_ffff. and the GPU memory rate in Nisght is 9GB/s. I check the FPGA signal, there is not any read or write request.

Why the GPU can’t get the correct data from PCIe card by cudaMemcpy() as pinned memory. How can I solve this problem?

PS: When I use unpinned memory by only delete cudaHostRegister(pci_data,…) , the data exchange between the GTX1060 and PCIe card are all correct. But the exchange don’t use DMA, the rate is very slow.

is this DMA techique related to nVidia’s GPUDirect technology?

if so, that would be limited to Quadro and Tesla boards.