We have developed soc camera based v4l2 driver for Omni Vision sensor OV4689 on jetson tk1 board using 21.4 L4T release. As such driver is working and we are able to capture image from sensor.
Issue is we are not able to allocate more than 8 buffers (via ioctl VIDIOC_REQBUFS). image size in our case is 2688x1520x1= 4085760 bytes.
Following is the dmesg command output.
[24824.048890] vmap allocation for size 4091904 failed: use vmalloc= to increase size.
[24824.057680] vi vi.0: dma_alloc_coherent of size 4087808 failed
Free memory available at time of error is 1.2 GB. following is the output of command cat /proc/meminfo
MemTotal: 1939044 kB
MemFree: 1262268 kB
is there any limitation in allocating buffers via VIDIOC_REQBUFS in jetson ?
It looks like vmalloc available space reached zero which is very much possible.
Vmap uses vmalloc space.
You can check the values of VmallocTotal and VmallocUsed in “cat /proc/meminfo” and see if the (VmallocTotal – VmallocUsed) < 4MB or not. If it is, then it means that you may run out of vmalloc space.
BTW, you also can find this information in Linux references. But I do not have any quick links to provide more information about related references.
there is enough vmalloc memory available after running over application that allocates 8 buffers for v4l2 driver but still next buffer allocation fails.
See below data
Before running our application, command ‘cat /prom/meminfo’ shows
Just confirming, since buffers go by page size (and it looks like on JTK1 size is 8192 per page), was the call for size 512 pages (approximately 4194204/8192 bytes)?
I’m not sure if I’m interpreting this correctly, but it seems that in your /proc/meminfo after the app has allocated the known working 4MB buffers (8 of them), that this information says things should work unless you need a contiguous block of memory:
"largest <i>contigious</i> block of vmalloc area which is free"
…but unless something in your ioctl required contiguous memory (or indirectly a driver related to your ioctl), you have plenty of memory to work with. I’m wondering about my own understanding of vmalloc, as it was intended to provide remapped chunks of physical memory which are not contiguous, but presented virtually to appear contiguous. This makes me very curious about the real definition of VmallocChunk…why statistics from vmalloc would be kept under vmalloc for contiguous definitions just seems odd unless it is offering that value for performance reasons. Historically, it looks like vmalloc memory was stored in a linked list, which worked but which was low performance…and then got a performance boost via an rbtree redesign (which might be “smart” and allocate contiguous chunks when available, but just “do the remapping thing” when needed). My thought was that VmallocChunk is listed because this is the largest chunk which could be vmalloc’d and actually be fortunate enough to also be contiguous without the TLB table performance hits.
It isn’t conclusive, but I see a hint that perhaps something in the chain of allocation wants contiguous memory (4MB wanted, only a bit less than 4MB contiguous available), although one would think a vmalloc call does not demand this.
yes this not conclusive. There is free space available in vmalloc but still vmalloc fails due to less continuous memory available in VmallocChunk. vmalloc does not require physical continuous memory.
allocation is done by v4l2 driver so it must require physically continuous memory. but then question is then why memory is allocated via vmalloc ?
Kernel message shows that function dma_alloc_coherent fails. Following is the dmesg command output.
[24824.048890] vmap allocation for size 4091904 failed: use vmalloc= to increase size.
[24824.057680] vi vi.0: dma_alloc_coherent of size 4087808 failed
allocation of v4l2 buffer is done in file “drivers/media/v4l2-core/videobuf2-dma-contig.c”, function “vb2_dc_alloc” using kernel API dma_alloc_coherent.
next question is why dma_alloc_coherent allocates memory using vmalloc ?
I do not know what v4l2 requires (I’ve never looked at v4l2 code), but I guess it comes down to the question as to whether physically contiguous memory is required for any reason, or if vmalloc is not used for any reason (one reason is being driven in an interrupt context which isn’t allowed). I also wonder about the true meaning of that log of “use vmalloc= to increase size”.
There have been some similar reports of running out of the vmalloc memory for MythTV, which uses vmalloc and some of the nVidia-related hardware used with MythTV requires vmalloc space. If we ignore the VmallocChunk (as I think the documentation must be horribly wrong), and work on simply adding more vmalloc space, we have a starting vmalloc of this on my test Jetson:
Could you see if allocating more vmalloc space via kernel “APPEND” tag in /boot/extlinux/extlinux.conf allows you to succeed via append of “vmalloc=256M” (or via “vmalloc=264M”)?
dma_alloc_coherent() does not guarantee physically contiguous memory. When a IOMMU is present, it can map dis-contiguous physical regions into a single region in bus address space.
if dma_alloc_coherent does not guarantee physically contiguous memory, then why memory allocation fails even though still enough non continuous memory is available.
In our case when it fails, we still have memory available in vmalloc (see below data)
are you sure that vmalloc failure and dma_alloc_coherent failure are related?
They could be separate calls failing. Can you check the dump_stack during both failure?
[24824.048890] vmap allocation for size 4091904 failed: use vmalloc= to increase size.
[24824.057680] vi vi.0: dma_alloc_coherent of size 4087808 failed
Also. please print the memory info
echo m > /proc/sysrq_trigger or use sysrq key
vmalloc and dma_alloc_coherent failure are related. as you can see in dump stack, vmalloc is called inside dma_alloc_coherent(name in dumpstack arm_iommu_alloc_attrs). also we have provided output of /proc/sysrq_trigger before and after failure.
I see your log. Vmalloc is used because vi is behind iommu or I would say iommu enabled. If device uses iommu, then pages are allocated using alloc_page() and ioremapped to a vmap area
Your question is why vmalloc is not allocating memory from the 53mb remaining. Answer is in
cat /proc/vmallocinfo
0xf0000000-0xf0002000 8192 rtl_init_one+0x2b8/0xd78 phys=32100000 ioremap
0xf0002000-0xf0004000 8192 pool_alloc_page+0x74/0xf0 pages=1 user
.
.
.
0xfee00000-0xff000000 2097152 pci_reserve_io+0x0/0x30 ioremap
First column is virtual memory
Vmalloc allocate virtually contiguous memory. So no virtual 4mb is available thus failing.
VmallocTotal: total size of vmalloc memory area
VmallocUsed: amount of vmalloc area which is used
VmallocChunk: largest contigious block of vmalloc area which is free
Code here:
void get_vmalloc_info(struct vmalloc_info *vmi)
{
struct vmap_area *va;
unsigned long free_area_size;
unsigned long prev_end;
vmi->used = 0;
vmi->largest_chunk = 0;
prev_end = VMALLOC_START;
spin_lock(&vmap_area_lock);
if (list_empty(&vmap_area_list)) {
vmi->largest_chunk = VMALLOC_TOTAL;
goto out;
}
list_for_each_entry(va, &vmap_area_list, list) {
unsigned long addr = va->va_start;
/*
* Some archs keep another range for modules in vmalloc space
*/
if (addr < VMALLOC_START)
continue;
if (addr >= VMALLOC_END)
break;
if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEING))
continue;
vmi->used += (va->va_end - va->va_start);
free_area_size = addr - prev_end;
if (vmi->largest_chunk < free_area_size)
vmi->largest_chunk = free_area_size;
prev_end = va->va_end;
}
if (VMALLOC_END - prev_end > vmi->largest_chunk)
vmi->largest_chunk = VMALLOC_END - prev_end;
out:
spin_unlock(&vmap_area_lock);
}
If you dont want cpu mapping, then explicitly call dma_alloc_attrs() instead of dma_alloc_coherent().
DMA_ATTR_NO_KERNEL_MAPPING: If you use this attribute, then vmalloc space is not used
as per vmalloc description, vmalloc allocates memory that is only virtually contiguous and not necessarily physically contiguous. vmalloc makes nonphysically contiguous pages contiguous in the virtual address space by setting up the page table entries. as here we in this scenario, we need 4MB of memory then is it not possible to allocate internally two buffers (for e.g. 2MB each) and then make it virtually continuous ?
or you mean to say that even though free memory is available in vmalloc, vmalloc will not allocate memory more than VmallocChunk in single call to vmalloc ?
No not possible. If you see this the first column is virtual memory only. Where is no contiguous 4mb virtual memory left, vmalloc can not start stitching 2mb of virtually contiguous memory to make it 4mb
root@tegra-ubuntu:/home/ubuntu# cat /proc/vmallocinfo
0xf0000000-0xf0002000 8192 rtl_init_one+0x2b8/0xd78 phys=32100000 ioremap
0xf0002000-0xf0004000 8192 pool_alloc_page+0x74/0xf0 pages=1 user
0xf0004000-0xf0007000 12288 pcpu_extend_area_map+0x20/0xa8 pages=2 vmalloc
0xf0007000-0xf001f000 98304 dmam_alloc_coherent+0x80/0xc4 pages=23 user
0xf001f000-0xf0024000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf0024000-0xf0029000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf0029000-0xf002e000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf002e000-0xf0033000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf0033000-0xf0035000 8192 pool_alloc_page+0x74/0xf0 pages=1 user
0xf0035000-0xf0037000 8192 pool_alloc_page+0x74/0xf0 pages=1 user