up front: I’m not sure this is the right forum for this, but since here perhaps are some folks who know something about the hardware structure of the Tegra K1, and where I asked before (e.g. Xilinx), maybe not so much, maybe someone throws some new insights at me…
Or points me to another (sub-)forum ;)
What’s this about:
I have a Avionic Design “Meerkat” board with the NVIDIA Tegra K1 on it.
On there is running the L4T Linux, kernel 3.10.105, I think plus some Avionic specific drivers, from their github.
A FPGA devboard (Artix7) is connected to the ARM CPU on the TK1, via PCI-express.
The FPGA structure defines some register blocks, and a 64KB memory area, which are memory-mapped.
I have attempted to communicate with the FPGA in two ways:
via the /dev/mem driver, doing 2 mmap()'ings, one for the registers, one for the memory area for main data transfer.
The Xilinx page called “Accessing+BRAM+In+Linux” claims that opening the /dev/mem file with the O_SYNC flag causes the mem mapping of the physical (FPGA/PCIe) adresses to virtual memory to be non-cached, and very slow access would confirm this to be the case.
Copying the 64K a few 10’s of times yields a transfer rate of ~ 2MByte/s, which I would consider extremely slow for PCIe (not to mention 4 lanes). It’s done with a for loop in 32bit words, though, as memcpy freezes.
But: I see updates on the memory-mapped registers (used e.g. for hand-shake) only reliably when I have a GDB breakpoint in the CPU code, right before reading the register.
From my perspective of limited knowledge of these things, this still sounds cache related?
I compiled the Xilinx XDMA driver, provided in Xilinx_Answer_65444_Linux, against the L4T kernel source, for ARM gnueabihf.
I am aware that, in the given form, the driver is spec’d for x86 systems only. I was told that the limitation is probably due to non-x86 systems, such as the ARM Cortex A on the TK1 maybe (?), potentially being non-cache-coherent, resulting in the CPU still accessing old data from its cache, while the DMA driver writes direcly to RAM. Can someone confirm this is the case on TK1?
So the driver probably needs to be modified - since that involves a road block for me, not a kernel dev so far, I tried out what happens as-is first.
So, while blindly copying the 64K block some times via DMA (using /dev/xdma0_c2h0) yields a usable 200MB/sec, my hand-shaking registers (using /dev/xdma0_user, which accesses registers without DMA, and again, mmap() to map it into user space) still have the same problem:
It only works when there is a breakpoint.
Now if I’d only see this with the “x86 only !” xdma driver, I’d not be surprized, for the reasons outlined above. But with /dev/mem, a part of the Linux, too?
I don’t have much of an idea what to further look for.
Does this make sense to anyone here?
Can someone point me to the proper diagnostic tools to use, for this kind of scenario? At the moment it’s a bit like poking at an alien space probe with a stick.
(I have no kernel dev experience, only done some bare metal MCU work)
PS. sorry for not posting relevant weblinks, but when my first post at another forum had links, it was marked as spam in the blink of an eye)