I have my PCIe board with FPGA working for all non DMA involving things currently.
dma_map_single_attrs, dma_alloc_coherent and friends all fail for me.
I am struggling to find a good way to translate the address of a chunk of memory.
I can obtain what i presume is a suitable block from get_free_pages, but getting a bus address suitable for my boards DMA engine is not something I seem to be able to work out on my own.
HI Tanks for the reply. Those are however not my actual problem.
I have a PCIe device working and I have a block of ram which could be suitable to copy the data into.
I need to translate this address to an address that I could pass over PCIe to my PCIe device so that that can stransfer data into this memory.
There exists on other platforms linux kernel calls that do this, however on tegra I am unsure how to do this.
pci_alloc_consistent maps to dma_alloc_coherent and both of those refuse to give me a 256k or better chunks. I need preferably 16 4 meg chunks. This is no problem on x86 and the function works fine and so does our FPGA. My quest began because of the tegra platform not yielding memory to me in the exact way you suggest.
__get_free_pages(__GFP_DMA, 10) does give me a virtual address ( and I assume a 4meg physical continues chunk of memory ) but passing the address I get from virt_to_phys to my fpga and trying to write there seems to lock the system pretty consistently.
In the ideal world I also map this same space to a cuda and or OpenGL context once I am done transferring an image in NV12 into it.
I really wish my desktop wasn’t fried right now (parts on order), but as I recall there is a kernel configuration option which I believe is disabled in the default TK1 kernel, but I don’t know about the TX1 kernel. The option is a security option to randomize parts of memory layout to make stack overflows more difficult. Most virtual addressing is not affected by this, but mmap and some other memory allocations are. The result is that the maximum size of the memory chunk which is contiguous becomes much smaller. I’m wondering if perhaps this kernel option is enabled on TX1 and causing the smaller allocation size you are seeing…I just can’t remember which option this is off the top of my head.
I’m not positive, but this is probably what I was remembering. I see on my JTX1 that this is on…possibly the proc file wouldn’t even exist on a TK1 with the option not enabled by default (or perhaps it exists, but is default to “0”). I’d recommend testing if large enough chunks of contiguous memory can be allocated when this is set to “0”. Trouble is that I don’t know if a system which has been running with this on can simply be put into a state of the ASLR deactivated and have the next memory allocation be full sized…could be that a number of allocations would be required after deactivating this before larger contiguous chunks become available.
I see how this could impact things. But disabling this does not make it possible for me to use pci_alloc_consistent.
__get_free_pages() does however really seem to work in that the memory looks like it is of the right size.
extern phys_addr_t memstart_addr;
/* PHYS_OFFSET - the physical address of the start of memory. */
#define PHYS_OFFSET ({ memstart_addr; })
#ifdef CONFIG_ARM64_64K_PAGES
#define VA_BITS (42)
#else
#define VA_BITS (39)
#endif
#define PAGE_OFFSET (UL(0xffffffffffffffff) << (VA_BITS - 1))
/*
* Physical vs virtual RAM address space conversion. These are
* private definitions which should NOT be used outside memory.h
* files. Use virt_to_phys/phys_to_virt/__pa/__va instead.
*/
#define __virt_to_phys(x) (((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET))
/*
* Note: Drivers should NOT use these. They are the wrong
* translation for translating DMA addresses. Use the driver
* DMA support - see dma-mapping.h.
*/
static inline phys_addr_t virt_to_phys(const volatile void *x)
{
return __virt_to_phys((unsigned long)(x));
}
however translating this and writing to that location over PCIe seems to trash the place where the kernel is, which seems to be near the end of the physical space.
I am MartijnBerger’s colleague, mainly responsible for the FPGA side of our application.
Today I took a dive into our driver code and have been able to debug and fix our problem. We were on the right track but had to increase the size of the coherent memory and coherent pool in the kernel parameters as advised in the TK1 topic on Large Coherent DMA blocks
Then using pci_alloc_consistent() as pointed out in #4 gave me both the correct virtual cpu address and the device memory address.
After some additional bookkeeping fixes I was able to correctly compile and run our code. The same driver now happily runs on both our Intel and Tegra based platform.
Thanks for your support!
Just to follow up with this original posting for anyone else having PCIe to endpoint DMA issues
I spoke with Martin, he used
vmalloc=256M cma=128M coherent-pool=96M to the boot parameters to get dma_alloc_coherent to give me a usable chunk.
Which work for me also.
The boot line parameters are added to /boot/extlinux.conf in the kernel
where base extlinux.conf files are located at jestpack/TX1/Linux_for_Tegra_tx1/bootloader/t210ref/ p2371-2180
the physical address returned by dma_alloc_coherent is the true PCI bus address to access memory from.
where z7pcie->dmaBuf0_pba is declared as dma_addr_t and is 64 bit address to set the endpoint DMA controller to access memory from. Note that the upper 32 address bits are zero for this physical address.