Hi, all.
i am recently working around an “FPGA + GPU” platform, where FPGA and DSP are connected through the PCIE Gen2 X4 bus.
by executing
'lspci -vv'
we observed our FPGA (a pcie endpoint device),
Xilinx Memory Controller, 7024, 10EE
link cap = Gen2 x4 MaxPayloadSize = 128B
... ...
this implies the FPGA has been recognized by the TX2 through the PCIE Gen2 x4 bus.
but during the 'DRiver DEvelopment ’ we have encountered a problem on PCIE Master Write,
i.,e.,
FPGA (obviously, the DMA MAster) actively writes to the GPU (tx2).
Thw workflow is as follows:
I. in the driver, we allocated a DMA consistent buffer via (Linux DMA API function)
virAddr = pci_alloc_consistent(pdev, 4096, &busAddr);
where,
'virAddr' is the kernel virtual address
'pdev' is the pointer to the device data structure (representing the PCI device, i.e., the FPGA).
'4096' the DMA test assumes 4096-byte Master DMA Write transaction.
'busAddr' is the container keeping the BUS address.
please note, our device uses 64-bit address for the Master DMA Write transaction.
and in the earlier part of our driver probing procedure, we have also passed the call to
pci_set_dma_mask(pdev, DMA_BIT_MASK(64))
which means, the TX2 ARCH allows for 64bit addressing with our device.
please also note a STRange problem, the BUS address returned by TX2 is always '0x0000-0000-8000-0000'
despite the methods we've chosen for DMA buffer allocation. In fact, we tried alot of alternatives including
a:
virAddr = __get_free_pages(GFP_KERNEL, 0);
busAddr = pci_map_single(pdev, virAddr, 4096, PCI_DMA_FROMDEVICE);
b:
virAddr = pci_alloc_consistent(pdev, 4096, &busAddr);
but, in either case, the returned bus address 'busAddr' is always a constant value of '0x0000-0000-8000-0000'
Intuitively, i thinks there is a problem with this phenomenon.
II. Pass the returned 'busAddr' (as u64) and the 'length in byte' (4096 in our test) to the FPGA through PCI-BAR-0
memory region. We have successfully observed that the corresponding registers in the FPGA held the requested
values (i.e., the configured 'busAddr', and 'length in byte').
III. Then start the Master DMA Write (a sequence of repeated accumulate numbers: 0x0, 0x1, 0x2, 0x3, ..., 0x1F,
0x0, 0x1, 0x2, 0x3, ..., 0x1F, 0x0, 0x1, 0x2, 0x3, ..., 0x1F, 0x0, 0x1, 0x2, 0x3, ...) by set the DMA-WRITE-
ENABLE bit in our FPGA through the BAR-0 access.
IV. We have observed, in the FPGA, that correct TLP packets (MaxPayloadSize = 128B) are generated and submitted to
the TX2, i.e.,
a sequence of 4096/128 = 32 MWr packets of 128B payload as follows:
MWr packet 00: 0x60000020 0x010000FF 0x00000000 0x80000000 0x00000000 0x00000001 ... 0x0000001F
MWr packet 01: 0x60000020 0x010000FF 0x00000000 0x80000080 0x00000000 0x00000001 ... 0x0000001F
MWr packet 02: 0x60000020 0x010000FF 0x00000000 0x80000100 0x00000000 0x00000001 ... 0x0000001F
...
MWr packet 31: 0x60000020 0x010000FF 0x00000000 0x80000F80 0x00000000 0x00000001 ... 0x0000001F
where, 0x60000020 indicates that each MWr packet is a Memory Write (FPGA writes data to TX2's memory)
packet of
#1: 128 bytes (20 means, 0x20 32bit words, i.e., 128B, as noted in the PCIE spec v2.1)
#2: 4DW header (i.e., TLP header is composed of 4 32bit word), i.e.,
0x60000020 0x010000FF 0x00000000 0x8000XXXX
where XXXX represents the varing address offset of each MWr packet
#3: 6, or in binary form '0-11-00000' represents that the TLP has an 4DW header and has payload data
which accords to #1 and #2
#4: 0100 corresponds to the 'bus number & device number & function number', which is verified
to be correct, otherwise, the 'busAddr' and 'length in byte' can not be configured through
BAR-0 access at all (we issue reads to these registers, which will cause the FPGA to send
back the register contents using Cpld TLP packet, and this will use the joint of
'bus number & device number & function number'. Since we successfully read the configured
values, 0010 must be valid and correct)
in each MWr packet, 0x00000000 0x00000001 ... 0x0000001F, are generated as the 128B payload
It is obviously seen that, the FPGA works fine.
V. BUT it turned out that TX2 hung itself soon once the FPGA initiates the DMA Master Write as enabled by the
Linux driver through the BAR-0 register access (set the DMA-Master-Write start bit to 1).
So, can anyone help me with this problem ? It does not make sense at all…