V4L2 capture jitter problem

I will apply this patch next week and provide you with the results. Thanks for the follow-up.

I’ve been able to try out this change. Note that I needed to add a parameter to get this to compile: “__flush_dcache_all(NULL);”.

This change induces a consistent 4ms delay across the capture period. I’ve taken one recording of a 10000 sample period (~20 minutes) and observed two instances where, in addition to the 4ms delay, there is a ~1ms jitter.

Thanks for your report.
Could you also help to try this patch.

From b1e5c1e716a33d317c9a82aa04eec76de46593df Mon Sep 17 00:00:00 2001
From: Puneet Saxena <puneets@nvidia.com>
Date: Fri, 25 Sep 2020 00:11:56 +0530
Subject: [PATCH] DNI: arch: arm64: dma: use HW flush in place of invalidate/clean

Use tegra_flush_dcache_all which uses SCF to flush complete
cache hierarchy

Bug 200652069

Change-Id: I51df21f17eb42ee566f95fc5b886c9bcd9e1caed
Signed-off-by: Puneet Saxena <puneets@nvidia.com>
---

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 6ca1f43..b5e55ab 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -30,6 +30,7 @@
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
 #include <linux/pci.h>
+#include <linux/tegra-cache.h>
 
 #include <asm/cacheflush.h>
 
@@ -863,6 +864,7 @@
 				    struct scatterlist *sgl, int nelems,
 				    enum dma_data_direction dir)
 {
+#if 0
 	struct scatterlist *sg;
 	int i;
 
@@ -872,12 +874,16 @@
 	for_each_sg(sgl, sg, nelems, i)
 		__dma_unmap_area_no_dsb(sg_virt(sg), sg->length, dir);
 	dsb(sy);
+#endif
+	void *unused = NULL;
+	tegra_flush_dcache_all(unused);
 }
 
 static void __iommu_sync_sg_for_device(struct device *dev,
 				       struct scatterlist *sgl, int nelems,
 				       enum dma_data_direction dir)
 {
+#if 0
 	struct scatterlist *sg;
 	int i;
 
@@ -887,6 +893,9 @@
 	for_each_sg(sgl, sg, nelems, i)
 		__dma_map_area_no_dsb(sg_virt(sg), sg->length, dir);
 	dsb(sy);
+#endif
+	void *unused = NULL;
+	tegra_flush_dcache_all(unused);
 }
 
 static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,

I needed to change the “#include” to reference tegra-mce.h as I don’t have a tegra-cache.h in my version of kernel. Running this, I see similar behavior – consistent 4ms delay overall and instances of 1ms jitter.

One other data point – I record buffer timestamp deltas (as frames are captured and are passed from VI → V4L2) and deltas as frames are passed to the application (V4L2 → application). With these DCACHE flush operations, I’m seeing the 1ms jitter appear in the frame capture buffer timestamps, as things are passed from VI → V4L2. This jitter is reflected in the V4L2 → application operations. Prior to these changes, the jitter was only detected in the V4L2 → application operation. I observed no jitter in the VI → V4L2 timestamp recording.

Here are plots of my captures for a frame of reference. I’m including the 1ms jitter observed with the DCACHE flushing operations, and then the 2ms jitter observed with the original dma-mapping.c file.

Y-axis in these plots is delta time between frames. We capture using a 10 fps trigger, so expectation is 100ms (100000us) between captures. X-axis is frame #.

The two following plots are the recorded deltas for a single 10000 frame capture period, using the DCACHE flush patches from above:

PLOT #1: V4L2 → application deltas

PLOT #2: VI → V4L2 timestamp deltas

The two following plots are the recorded deltas for a single 600 frame capture period, using the original dma-mapping.c implementation. Note that the V4L2 → application deltas do not coincide with the VI → V4L2 deltas, and the deltas in the latter are actually quite small (roughly +160us at worst.)

PLOT #3: V4L2 → application deltas

PLOT #4: VI → V4L2 timestamp deltas

So do you means, the delay worsens from ~2ms to ~4ms than the patch from V4L2 capture jitter problem - #23 by ShaneCCC

I’ve updated the plot post above and have numbered the plots from 1-4. Plots 1 & 2 show the DCACHE flush behavior. Plots 3 & 4 show the previous behavior.

Plots 3 & 4 show a consistent delta time of 100000us (which we’d expect at a 10fps capture rate). Plot 3 shows several “spikes”. Three of these are on the order of the 2ms jitter that caused me to write this post.

Plots 1 & 2 show a consistent delta time of 104000us – an extra 4ms appears which I presume is from the DCACHE flush. Plot 1 & 2 also show 4 instances of 1ms jitter.

So do you means, the delay worsens from ~2ms to ~4ms than the patch from V4L2 capture jitter problem

An overall delay of 4ms is introduced by the DCACHE flush, and an additional 1ms jitter is observed on top of that.

I hope this is clear. If not, let me know and I’ll try to clarify further.

tegra_flush_dcache_all() shouldn’t be used for all sync requests. For smaller buffers, it wouldn’t be beneficial and increases the cache sync overhead significantly.
It should only be used when the buffer length is > few_mb. Could you start with buffer_length >= 1MB and check.