not able to allocate more than 8 v4l2 buffers

We have developed soc camera based v4l2 driver for Omni Vision sensor OV4689 on jetson tk1 board using 21.4 L4T release. As such driver is working and we are able to capture image from sensor.

Issue is we are not able to allocate more than 8 buffers (via ioctl VIDIOC_REQBUFS). image size in our case is 2688x1520x1= 4085760 bytes.

Following is the dmesg command output.

[24824.048890] vmap allocation for size 4091904 failed: use vmalloc= to increase size.
[24824.057680] vi vi.0: dma_alloc_coherent of size 4087808 failed

Free memory available at time of error is 1.2 GB. following is the output of command cat /proc/meminfo
MemTotal: 1939044 kB
MemFree: 1262268 kB

is there any limitation in allocating buffers via VIDIOC_REQBUFS in jetson ?

Hi PratikPatel,

It looks like vmalloc available space reached zero which is very much possible.

Vmap uses vmalloc space.

You can check the values of VmallocTotal and VmallocUsed in “cat /proc/meminfo” and see if the (VmallocTotal – VmallocUsed) < 4MB or not. If it is, then it means that you may run out of vmalloc space.

BTW, you also can find this information in Linux references. But I do not have any quick links to provide more information about related references.

Thanks

Here’s a nice vmalloc reference:
http://www.makelinux.net/books/lkd2/ch11lev1sec5

Hi Kaycc and linux dev,

there is enough vmalloc memory available after running over application that allocates 8 buffers for v4l2 driver but still next buffer allocation fails.

See below data

Before running our application, command ‘cat /prom/meminfo’ shows

VmallocTotal: 245760 kB
VmallocUsed: 184740 kB
VmallocChunk: 16380 kB

Free: 61020 kB

After Running our application (after allocating 8 v4l2 buffers each of size 4MB), command ‘cat /prom/meminfo’ shows

VmallocTotal: 245760 kB
VmallocUsed: 192732 kB
VmallocChunk: 3912 kB

Free: 53028 kB

I am curious about what size (number of pages) is being requested when it fails? Is it a single page?

allocation fails when we request to allocate 4MB of buffer using V4L2 ioctl VIDIOC_REQBUFS

Just confirming, since buffers go by page size (and it looks like on JTK1 size is 8192 per page), was the call for size 512 pages (approximately 4194204/8192 bytes)?

I’m not sure if I’m interpreting this correctly, but it seems that in your /proc/meminfo after the app has allocated the known working 4MB buffers (8 of them), that this information says things should work unless you need a contiguous block of memory:

VmallocTotal: 245760 kB
VmallocUsed: 192732 kB
VmallocChunk: 3912 kB

Documentation says the VmallocChunk is the…

"largest <i>contigious</i> block of vmalloc area which is free"

…but unless something in your ioctl required contiguous memory (or indirectly a driver related to your ioctl), you have plenty of memory to work with. I’m wondering about my own understanding of vmalloc, as it was intended to provide remapped chunks of physical memory which are not contiguous, but presented virtually to appear contiguous. This makes me very curious about the real definition of VmallocChunk…why statistics from vmalloc would be kept under vmalloc for contiguous definitions just seems odd unless it is offering that value for performance reasons. Historically, it looks like vmalloc memory was stored in a linked list, which worked but which was low performance…and then got a performance boost via an rbtree redesign (which might be “smart” and allocate contiguous chunks when available, but just “do the remapping thing” when needed). My thought was that VmallocChunk is listed because this is the largest chunk which could be vmalloc’d and actually be fortunate enough to also be contiguous without the TLB table performance hits.

It isn’t conclusive, but I see a hint that perhaps something in the chain of allocation wants contiguous memory (4MB wanted, only a bit less than 4MB contiguous available), although one would think a vmalloc call does not demand this.

yes this not conclusive. There is free space available in vmalloc but still vmalloc fails due to less continuous memory available in VmallocChunk. vmalloc does not require physical continuous memory.

allocation is done by v4l2 driver so it must require physically continuous memory. but then question is then why memory is allocated via vmalloc ?

Kernel message shows that function dma_alloc_coherent fails. Following is the dmesg command output.

[24824.048890] vmap allocation for size 4091904 failed: use vmalloc= to increase size.
[24824.057680] vi vi.0: dma_alloc_coherent of size 4087808 failed

allocation of v4l2 buffer is done in file “drivers/media/v4l2-core/videobuf2-dma-contig.c”, function “vb2_dc_alloc” using kernel API dma_alloc_coherent.

next question is why dma_alloc_coherent allocates memory using vmalloc ?

I do not know what v4l2 requires (I’ve never looked at v4l2 code), but I guess it comes down to the question as to whether physically contiguous memory is required for any reason, or if vmalloc is not used for any reason (one reason is being driven in an interrupt context which isn’t allowed). I also wonder about the true meaning of that log of “use vmalloc= to increase size”.

There have been some similar reports of running out of the vmalloc memory for MythTV, which uses vmalloc and some of the nVidia-related hardware used with MythTV requires vmalloc space. If we ignore the VmallocChunk (as I think the documentation must be horribly wrong), and work on simply adding more vmalloc space, we have a starting vmalloc of this on my test Jetson:

VmallocTotal:     245760 kB
VmallocUsed:      169460 kB
VmallocChunk:      31824 kB

Then I added a duplicate of the default entry in the Jetson (which I named “vmalloc”), with this appended to the existing “APPEND” tag:

vmalloc=256M

The resulting new vmalloc shows up with these numbers in meminfo:

VmallocTotal:     253952 kB
VmallocUsed:      171348 kB
VmallocChunk:      25340 kB

Could you see if allocating more vmalloc space via kernel “APPEND” tag in /boot/extlinux/extlinux.conf allows you to succeed via append of “vmalloc=256M” (or via “vmalloc=264M”)?

NOTE: The description I found about this is here:
https://www.mythtv.org/wiki/Common_Problem:_vmalloc_too_small

after increasing vmalloc space to 360MB in kernel bootargs, we are able to allocate more than 8 v4l2 buffers.

v4l2 physical continuous memory is required as it will be used vi2 hardware for storing data in memory.

that is why dma_alloc_coherent function is used to allocate memory.

question is why dma_alloc_coherent uses vmalloc memory ? as far as i know dma_alloc_coherent is used to get big physical continuous memory.

Hi PratikPatel,

dma_alloc_coherent() does not guarantee physically contiguous memory. When a IOMMU is present, it can map dis-contiguous physical regions into a single region in bus address space.

Please refer https://www.kernel.org/doc/Documentation/DMA-API.txt , http://www.linuxjournal.com/article/7104 for more details.

Thanks

if dma_alloc_coherent does not guarantee physically contiguous memory, then why memory allocation fails even though still enough non continuous memory is available.

In our case when it fails, we still have memory available in vmalloc (see below data)

VmallocTotal: 245760 kB
VmallocUsed: 192732 kB
VmallocChunk: 3912 kB

Free: 53028 kB

There is 53MB available in vmalloc but it still fails when it tries to allocate 4MB using dma_alloc_cohherent.

Pratik,

are you sure that vmalloc failure and dma_alloc_coherent failure are related?
They could be separate calls failing. Can you check the dump_stack during both failure?

[24824.048890] vmap allocation for size 4091904 failed: use vmalloc= to increase size.
[24824.057680] vi vi.0: dma_alloc_coherent of size 4087808 failed

Also. please print the memory info
echo m > /proc/sysrq_trigger or use sysrq key

regards
Bibek

hi bbasu,

vmalloc and dma_alloc_coherent failure are related. as you can see in dump stack, vmalloc is called inside dma_alloc_coherent(name in dumpstack arm_iommu_alloc_attrs). also we have provided output of /proc/sysrq_trigger before and after failure.

================= Before allocating v4l2 buffer =============
SysRq : Show Memory
Mem-info:
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 43
CPU 1: hi: 186, btch: 31 usd: 119
CPU 2: hi: 186, btch: 31 usd: 76
CPU 3: hi: 186, btch: 31 usd: 46
HighMem per-cpu:
CPU 0: hi: 186, btch: 31 usd: 161
CPU 1: hi: 186, btch: 31 usd: 110
CPU 2: hi: 186, btch: 31 usd: 30
CPU 3: hi: 186, btch: 31 usd: 46
active_anon:4806 inactive_anon:68 isolated_anon:0
active_file:2361 inactive_file:14570 isolated_file:0
unevictable:853 dirty:0 writeback:0 unstable:0
free:447241 slab_reclaimable:1714 slab_unreclaimable:3713
mapped:5119 shmem:84 pagetables:131 bounce:0
free_cma:4026
Normal free:709132kB min:3408kB low:4260kB high:5112kB active_anon:0kB inactive_anon:0kB active_file:1828kB inactive_file:2796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:778240kB managed:725984kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:6856kB slab_unreclaimable:14852kB kernel_stack:992kB pagetables:524kB unstable:0kB bounce:0kB free_cma:16104kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve: 0 9320 9320
HighMem free:1079832kB min:512kB low:1912kB high:3312kB active_anon:19224kB inactive_anon:272kB active_file:7616kB inactive_file:55484kB unevictable:3412kB isolated(anon):0kB isolated(file):0kB present:1192960kB managed:1192960kB mlocked:3412kB dirty:0kB writeback:0kB mapped:20472kB shmem:336kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve: 0 0 0
Normal: 54kB (EM) 68kB (UMC) 816kB (UEM) 332kB (EMC) 364kB (MC) 4128kB (MC) 4256kB (M) 5512kB (MC) 61024kB (UEMC) 52048kB (UMC) 1684096kB (MRC) = 709092kB
HighMem: 0
4kB 18kB (U) 116kB (M) 032kB 064kB 2128kB (UM) 1256kB (U) 0512kB 21024kB (UM) 02048kB 2634096kB (MR) = 1079832kB
17693 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
515840 pages of RAM
448161 free pages
31079 reserved pages
5435 slab pages
273224 pages shared
0 pages swap cached

===== Afetr allocating 8 v4l2 buffers =========

[ 113.435076] vmap allocation for size 4091904 failed: use vmalloc= to increase size.
[ 113.443258] CPU: 0 PID: 1199 Comm: mep_sensor_test Not tainted 3.10.40 #16
[ 113.450164] [] (unwind_backtrace+0x0/0x134) from [] (show_stack+0x18/0x1c)
[ 113.458788] [] (show_stack+0x18/0x1c) from [] (alloc_vmap_area.isra.32+0x2c8/0x2d0)
[ 113.468220] [] (alloc_vmap_area.isra.32+0x2c8/0x2d0) from [] (__get_vm_area_node.isra.33+0xa4/0x1a8)
[ 113.479254] [] (__get_vm_area_node.isra.33+0xa4/0x1a8) from [] (get_vm_area_caller+0x48/0x50)
[ 113.489511] [] (get_vm_area_caller+0x48/0x50) from [] (arm_iommu_alloc_attrs+0x368/0x4c8)
[ 113.499444] [] (arm_iommu_alloc_attrs+0x368/0x4c8) from [] (vb2_dc_alloc+0x78/0x108 [videobuf2_dma_contig])
[ 113.510920] [] (vb2_dc_alloc+0x78/0x108 [videobuf2_dma_contig]) from [] (__vb2_queue_alloc+0xf8/0x42c)
[ 113.521988] [] (vb2_queue_alloc+0xf8/0x42c) from [] (reqbufs.isra.10+0x104/0x260)
[ 113.531552] [] (__reqbufs.isra.10+0x104/0x260) from [] (soc_camera_reqbufs+0xa8/0xc8)
[ 113.541110] [] (soc_camera_reqbufs+0xa8/0xc8) from [] (__video_do_ioctl+0x274/0x328)
[ 113.550599] [] (__video_do_ioctl+0x274/0x328) from [] (video_usercopy+0x1b8/0x440)
[ 113.559915] [] (video_usercopy+0x1b8/0x440) from [] (v4l2_ioctl+0x148/0x168)
[ 113.568698] [] (v4l2_ioctl+0x148/0x168) from [] (do_vfs_ioctl+0x3f4/0x5b4)
[ 113.577331] [] (do_vfs_ioctl+0x3f4/0x5b4) from [] (SyS_ioctl+0x58/0x168)
[ 113.585760] [] (SyS_ioctl+0x58/0x168) from [] (ret_fast_syscall+0x0/0x30)
[ 113.594594] vi vi.0: dma_alloc_coherent of size 4087808 failed

===== output of /proc/sysrq_trigger after failure of dma_alloc_coherent =========
SysRq : Show Memory
Mem-info:
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 33
CPU 1: hi: 186, btch: 31 usd: 127
CPU 2: hi: 186, btch: 31 usd: 71
CPU 3: hi: 186, btch: 31 usd: 47
HighMem per-cpu:
CPU 0: hi: 186, btch: 31 usd: 183
CPU 1: hi: 186, btch: 31 usd: 118
CPU 2: hi: 186, btch: 31 usd: 28
CPU 3: hi: 186, btch: 31 usd: 30
active_anon:4834 inactive_anon:67 isolated_anon:0
active_file:2364 inactive_file:14577 isolated_file:0
unevictable:853 dirty:0 writeback:1 unstable:0
free:439173 slab_reclaimable:1720 slab_unreclaimable:3719
mapped:13108 shmem:85 pagetables:157 bounce:0
free_cma:4026
Normal free:708944kB min:3408kB low:4260kB high:5112kB active_anon:0kB inactive_anon:0kB active_file:1836kB inactive_file:2796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:778240kB managed:725984kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:6880kB slab_unreclaimable:14876kB kernel_stack:1000kB pagetables:628kB unstable:0kB bounce:0kB free_cma:16104kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve: 0 9320 9320
HighMem free:1047748kB min:512kB low:1912kB high:3312kB active_anon:19336kB inactive_anon:268kB active_file:7620kB inactive_file:55512kB unevictable:3412kB isolated(anon):0kB isolated(file):0kB present:1192960kB managed:1192960kB mlocked:3412kB dirty:0kB writeback:4kB mapped:52428kB shmem:340kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve: 0 0 0
Normal: 64kB (UEM) 38kB (UMC) 416kB (EM) 332kB (EMC) 464kB (UMC) 5128kB (UMC) 5256kB (UM) 6512kB (UMC) 51024kB (EMC) 52048kB (UMC) 1684096kB (MRC) = 708944kB
HighMem: 55
4kB (UM) 258kB (UM) 1416kB (UM) 632kB (UM) 064kB 1128kB (U) 1256kB (U) 0512kB 21024kB (UM) 22048kB (U) 2544096kB (MR) = 1047748kB
17704 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
515840 pages of RAM
440112 free pages
31079 reserved pages
5431 slab pages
281002 pages shared
0 pages swap cached

Hi Pratik,

I see your log. Vmalloc is used because vi is behind iommu or I would say iommu enabled. If device uses iommu, then pages are allocated using alloc_page() and ioremapped to a vmap area

Your question is why vmalloc is not allocating memory from the 53mb remaining. Answer is in
cat /proc/vmallocinfo
0xf0000000-0xf0002000 8192 rtl_init_one+0x2b8/0xd78 phys=32100000 ioremap
0xf0002000-0xf0004000 8192 pool_alloc_page+0x74/0xf0 pages=1 user
.
.
.
0xfee00000-0xff000000 2097152 pci_reserve_io+0x0/0x30 ioremap

First column is virtual memory

Vmalloc allocate virtually contiguous memory. So no virtual 4mb is available thus failing.

[ 0.000000] vmalloc : 0xf0000000 - 0xff000000 ( 240 MB)

cat /proc/meminfo | grep Vmalloc

VmallocTotal: 245760 kB
VmallocUsed: 192732 kB
VmallocChunk: 3912 kB

VmallocTotal: total size of vmalloc memory area
VmallocUsed: amount of vmalloc area which is used
VmallocChunk: largest contigious block of vmalloc area which is free

Code here:
void get_vmalloc_info(struct vmalloc_info *vmi)
{
struct vmap_area *va;
unsigned long free_area_size;
unsigned long prev_end;

vmi->used = 0;
vmi->largest_chunk = 0;

prev_end = VMALLOC_START;

spin_lock(&vmap_area_lock);

if (list_empty(&vmap_area_list)) {
	vmi->largest_chunk = VMALLOC_TOTAL;
	goto out;
}

list_for_each_entry(va, &vmap_area_list, list) {
	unsigned long addr = va->va_start;

	/*
	 * Some archs keep another range for modules in vmalloc space
	 */
	if (addr < VMALLOC_START)
		continue;
	if (addr >= VMALLOC_END)
		break;

	if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEING))
		continue;

	vmi->used += (va->va_end - va->va_start);

	free_area_size = addr - prev_end;
	if (vmi->largest_chunk < free_area_size)
		vmi->largest_chunk = free_area_size;

	prev_end = va->va_end;
}

if (VMALLOC_END - prev_end > vmi->largest_chunk)
	vmi->largest_chunk = VMALLOC_END - prev_end;

out:
spin_unlock(&vmap_area_lock);
}

If you dont want cpu mapping, then explicitly call dma_alloc_attrs() instead of dma_alloc_coherent().
DMA_ATTR_NO_KERNEL_MAPPING: If you use this attribute, then vmalloc space is not used

regards
bibek

hi bibek,

Thanks for detail explanation.

as per vmalloc description, vmalloc allocates memory that is only virtually contiguous and not necessarily physically contiguous. vmalloc makes nonphysically contiguous pages contiguous in the virtual address space by setting up the page table entries. as here we in this scenario, we need 4MB of memory then is it not possible to allocate internally two buffers (for e.g. 2MB each) and then make it virtually continuous ?

or you mean to say that even though free memory is available in vmalloc, vmalloc will not allocate memory more than VmallocChunk in single call to vmalloc ?

No not possible. If you see this the first column is virtual memory only. Where is no contiguous 4mb virtual memory left, vmalloc can not start stitching 2mb of virtually contiguous memory to make it 4mb

root@tegra-ubuntu:/home/ubuntu# cat /proc/vmallocinfo
0xf0000000-0xf0002000 8192 rtl_init_one+0x2b8/0xd78 phys=32100000 ioremap
0xf0002000-0xf0004000 8192 pool_alloc_page+0x74/0xf0 pages=1 user
0xf0004000-0xf0007000 12288 pcpu_extend_area_map+0x20/0xa8 pages=2 vmalloc
0xf0007000-0xf001f000 98304 dmam_alloc_coherent+0x80/0xc4 pages=23 user
0xf001f000-0xf0024000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf0024000-0xf0029000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf0029000-0xf002e000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf002e000-0xf0033000 20480 tegra_spi_init_dma_param+0x84/0x1cc pages=4 user
0xf0033000-0xf0035000 8192 pool_alloc_page+0x74/0xf0 pages=1 user
0xf0035000-0xf0037000 8192 pool_alloc_page+0x74/0xf0 pages=1 user

Answer to your 2nd question is Yes

thanks bibek for clarification.