Jetson TK1 -- fastest frame rate recording to flash at 1280x960

I am trying to use the OpenCV4Tegra library on a Jetson TK1 to record a video file from a stream of images of size 1280x960. I’d like to get 30Hz frame rate or as close to that as possible.
I need to store the images without skipping frames, losslessly if possible. but the most important thing is the frame rate. If there were a somewhat lossy but fast codec that took advantage of the GPU on the Jetson to give me a 30fps rate, that would be very helpful.

I’ve tried using the “FFV1” codec and the MJPEG codec and the best I’ve seen is ~8 - ! ~12 fps recorded to either the onboard flash or to the flash drives mounted in the SD card slot.

Can someone tell me what’s possible, and hopefully some hints or pointers to references on how to achieve the best frame rate I can?

thanks!

You’ll probably benefit by providing information on the nature of the source of video stream, e.g., a certain camera over USB3 or Gbit net.

Fair enough, and thanks for your response.

It is a Point Gray Flea3 over USB3. I have no problems getting frames from the camera at high rates – it’s encoding and saving them to flash that is taking all the time.

As far as I can tell, using gstreamer won’t work because the flea3 doesn’t appear as a video device and I have to use the Point Gray Flycap2 library to grab frames.

Or is it the case that no lossless compression will shrink a 1280x960 8 bit image enough to enable writing to flash at frame rate?

I’m not suggesting this as a fix, but more as an experiment. Have you tried using a regular hard drive or SSD on the SATA port? Since 1280X960 is only a bit over 1.1 MB/s, I would think the drive itself is not the issue. A related question: How did you determine that the saved file is not keeping up?

Video encoding is a very heavy operation and without support from a dedicated HW block it’s easily a bottle neck.

If you are already using the Point Gray Flycap2 library to grab frames then you should be able to use the GStreamer’s appsink plugin to feed the frames to a GStreamer pipeline doing the encode. You can set a high bitrate for the encoder to get a better quality video stream.

1280x960 8 bits per pixel is 1.2MB and if you want 30 fps, then it’s 35MB/s. If you actually meant 8 bits per component, then you would need a fast SSD to be able to constantly write that. I’m guessing that the eMMC can’t do even the 35 MB/s constantly.

I am doing a very similar thing (using a Point Grey Grasshopper 3 USB3.0 camera at 1920x1200 to record) and am also unable to grab images at 30fps without any frame skips. The image file is written to an SSD so the hard drive is not the issue and the CPU useage rarely exceeds 50%.

I have had some success with forcing the CPU to maximum and performance etc but I still cannot exceed 15fps without dropping at least 10 frames every 1000 and I can’t figure out why.

This is probably a silly question, but looking at a file of video it’s probably pretty easy to notice missing frames. However, how do you know in which part of the chain of processing the frames were dropped from?

I don’t have one of these cameras to test, but it would be very interesting to run strace (and perhaps ltrace) with the option to summarize time in various calls and see where time is spent both with and without capture to a file. One thing I always wonder about when a chain of applications are involved is whether one of the applications…which could remain open until done…is actually opening and closing over and over for a single operation. This adds an extreme amount of time cost versus staying open until done.

Try narrowing the bottleneck:

  • Did you verify with “lsusb -t” that you really use usb3 and not usb2?
  • How fast can you grab the images from the camera if you deletethem immediately (no encoding, no storing)?
  • Try writing the encoded stream to /dev/null to make sure the SSD does not affect performance

lsusb -t yields:
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=tegra-ehci/1p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=tegra-xhci/2p, 5000M
|__ Port 1: Dev 2, If 0, Class=Vendor Specific Class, Driver=, 5000M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=tegra-xhci/6p, 480M

which, as far as I can tell, is indicative of USB3.0.

Without deleting them or doing anything other than reading them in I get 120 fps with some frame skips but 15, 30 and 45 fps cause no issues at all.

Frustratingly when I write the data to /dev/null no frame skips occur which suggests that the SSD is the issue after all.

How should I combat this? Is it likely to be a bus speed issue or cpu time instead? I can probably mess with threads and make one grab the data whilst the other writes it but this would only work if the bus speed of the SATA connection isn’t the issue.

How many different kinds of SATA drives have you tried? Is it just the SSD?

I have only tried using this SSD though I don’t believe it should have any issues. Running fsck shows nothing abnormal, I reformatted it to ext4 before use, and it is a crucial m500 (500Gb).
Running some benchmarks shows the following:

$sudo hdparm -tT /dev/sda1:

 /dev/sda1:
 Timing cached reads:   4836 MB in  2.00 seconds = 2420.93 MB/sec
 Timing buffered disk reads: 666 MB in  3.01 seconds = 221.51 MB/sec

$dd if=/dev/zero of=/mnt/R/testFile0 bs=8k count=40000;

40000+0 records in
40000+0 records out
327680000 bytes (328 MB) copied, 2.15787 s, 152 MB/s

I took your advice in running strace and found that when not recording (just idling) it spends 73.45% of the time in futex (which I believe is grabbing the image) yet when recording it spends more than 50% of the time writing to the file.

So far as hard drives differing, I think some of the driver code behaves differently between standard hard drives and SSDs…which is why I wondered.

So on the time spent writing to the file, does each write show a separate open (e.g., fopen) and close? Or does strace show the file opening once, and running till done, then close? The process of opening a file is very expensive. I once had to figure out why a script on financial data was taking 3 days to run. When I found (through strace) that the file opened and closed for each iteration of a loop, and modfied it to open once prior to the loop, the 3 days it took was reduced to 2 minutes. I’m wondering about time spent acquiring resources, rather than time within the resource.

There may be other variants on the same open/close theme, e.g., if each frame required opening and closing the camera, then this would be an overhead per frame instead of just startup time. This is often not an issue with single programs, but when multiple pieces are involved with each piece talking to another, then it can be astonishing just how expensive open/close gets. Take a close look at any open or close times.

Just done a heap of tests to see if I could minimize the write time and this is the output of

strace -T recorder -o straceRecordingDump.txt

I have only included one cycle of the loop that performs the writing as no other output is recorded until recording stops (it is a very repetitive output). Comments have been added by myself but as far as I can see there are not any erroneous open/close commands. All entries in <> are the times outputted by strace that measure the time between the start and end of a system call.

//Command given to camera to write image buffer into variable
gettimeofday({1452565663, 42291}, NULL) = 0 <0.000014>
futex(0x2c4a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 765, {1452565673, 42291000}, ffffffff) = 0 <0.019335>
futex(0x2c488, FUTEX_WAKE_PRIVATE, 1)   = 0 <0.000025>
clock_gettime(CLOCK_MONOTONIC, {6850, 784102561}) = 0 <0.000019>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 884102000}}, NULL) = 0 <0.000026>
ioctl(14, USBDEVFS_SUBMITURB, 0x396a8)  = 0 <0.000045>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000029>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = 0 <0.000022>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000020>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = -1 EAGAIN (Resource temporarily unavailable) <0.000021>
futex(0x2d0b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2d0b0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000031>
futex(0x2d098, FUTEX_WAKE_PRIVATE, 1)   = 1 <0.001063>
clock_gettime(CLOCK_MONOTONIC, {6850, 786114039}) = 0 <0.000020>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 886114000}}, NULL) = 0 <0.000024>
ioctl(14, USBDEVFS_SUBMITURB, 0x38968)  = 0 <0.000040>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000024>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = 0 <0.000023>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000021>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = -1 EAGAIN (Resource temporarily unavailable) <0.000020>;
//The buffer has now been received in full
//Write to the ssd
write(19, "740t006000|07000p0x00t00000

//Command given to camera to write image buffer into variable
gettimeofday({1452565663, 42291}, NULL) = 0 <0.000014>
futex(0x2c4a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 765, {1452565673, 42291000}, ffffffff) = 0 <0.019335>
futex(0x2c488, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000025>
clock_gettime(CLOCK_MONOTONIC, {6850, 784102561}) = 0 <0.000019>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 884102000}}, NULL) = 0 <0.000026>
ioctl(14, USBDEVFS_SUBMITURB, 0x396a8) = 0 <0.000045>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000029>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = 0 <0.000022>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000020>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = -1 EAGAIN (Resource temporarily unavailable) <0.000021>
futex(0x2d0b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2d0b0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000031>
futex(0x2d098, FUTEX_WAKE_PRIVATE, 1) = 1 <0.001063>
clock_gettime(CLOCK_MONOTONIC, {6850, 786114039}) = 0 <0.000020>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 886114000}}, NULL) = 0 <0.000024>
ioctl(14, USBDEVFS_SUBMITURB, 0x38968) = 0 <0.000040>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000024>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = 0 <0.000023>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000021>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = -1 EAGAIN (Resource temporarily unavailable) <0.000020>;
//The buffer has now been received in full
//Write to the ssd
write(19, “\374\7\10t\10\10\360\10\10|\10\7\370\10\10p\10\7x\10\10t\10\10\200\7\10\0\7\10\370\10”…, 4096) = 4096 <0.000052>; //first block written
write(19, “\10\t\0\10\t\0\10\10\304\7\7\1\10\7\16\10\7\16\10\t\200\10\10\4\10\10\n\10\10\212\10\7”…, 3452928) = 3452928 <0.010385>; //summary of the ensuing blocks
//repeat the above

0700"..., 4096) = 4096 <0.000052>; //first block written
write(19, "0\t

//Command given to camera to write image buffer into variable
gettimeofday({1452565663, 42291}, NULL) = 0 <0.000014>
futex(0x2c4a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 765, {1452565673, 42291000}, ffffffff) = 0 <0.019335>
futex(0x2c488, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000025>
clock_gettime(CLOCK_MONOTONIC, {6850, 784102561}) = 0 <0.000019>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 884102000}}, NULL) = 0 <0.000026>
ioctl(14, USBDEVFS_SUBMITURB, 0x396a8) = 0 <0.000045>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000029>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = 0 <0.000022>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000020>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = -1 EAGAIN (Resource temporarily unavailable) <0.000021>
futex(0x2d0b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2d0b0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000031>
futex(0x2d098, FUTEX_WAKE_PRIVATE, 1) = 1 <0.001063>
clock_gettime(CLOCK_MONOTONIC, {6850, 786114039}) = 0 <0.000020>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 886114000}}, NULL) = 0 <0.000024>
ioctl(14, USBDEVFS_SUBMITURB, 0x38968) = 0 <0.000040>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000024>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = 0 <0.000023>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000021>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = -1 EAGAIN (Resource temporarily unavailable) <0.000020>;
//The buffer has now been received in full
//Write to the ssd
write(19, “\374\7\10t\10\10\360\10\10|\10\7\370\10\10p\10\7x\10\10t\10\10\200\7\10\0\7\10\370\10”…, 4096) = 4096 <0.000052>; //first block written
write(19, “\10\t\0\10\t\0\10\10\304\7\7\1\10\7\16\10\7\16\10\t\200\10\10\4\10\10\n\10\10\212\10\7”…, 3452928) = 3452928 <0.010385>; //summary of the ensuing blocks
//repeat the above

0\t

//Command given to camera to write image buffer into variable
gettimeofday({1452565663, 42291}, NULL) = 0 <0.000014>
futex(0x2c4a4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 765, {1452565673, 42291000}, ffffffff) = 0 <0.019335>
futex(0x2c488, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000025>
clock_gettime(CLOCK_MONOTONIC, {6850, 784102561}) = 0 <0.000019>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 884102000}}, NULL) = 0 <0.000026>
ioctl(14, USBDEVFS_SUBMITURB, 0x396a8) = 0 <0.000045>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000029>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = 0 <0.000022>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000020>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d684) = -1 EAGAIN (Resource temporarily unavailable) <0.000021>
futex(0x2d0b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2d0b0, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000031>
futex(0x2d098, FUTEX_WAKE_PRIVATE, 1) = 1 <0.001063>
clock_gettime(CLOCK_MONOTONIC, {6850, 786114039}) = 0 <0.000020>
timerfd_settime(13, TFD_TIMER_ABSTIME, {it_interval={0, 0}, it_value={6850, 886114000}}, NULL) = 0 <0.000024>
ioctl(14, USBDEVFS_SUBMITURB, 0x38968) = 0 <0.000040>
poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN}, {fd=13, events=POLLIN}, {fd=14, events=POLLOUT}], 4, 60000) = 1 ([{fd=14, revents=POLLOUT}]) <0.000024>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = 0 <0.000023>
timerfd_settime(13, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000021>
ioctl(14, USBDEVFS_REAPURBNDELAY, 0xbe85d6b4) = -1 EAGAIN (Resource temporarily unavailable) <0.000020>;
//The buffer has now been received in full
//Write to the ssd
write(19, “\374\7\10t\10\10\360\10\10|\10\7\370\10\10p\10\7x\10\10t\10\10\200\7\10\0\7\10\370\10”…, 4096) = 4096 <0.000052>; //first block written
write(19, “\10\t\0\10\t\0\10\10\304\7\7\1\10\7\16\10\7\16\10\t\200\10\10\4\10\10\n\10\10\212\10\7”…, 3452928) = 3452928 <0.010385>; //summary of the ensuing blocks
//repeat the above

000406060\t000000\n00120"..., 3452928) = 3452928 <0.010385>; //summary of the ensuing blocks
//repeat the above

Annoyingly the time it takes to write to the ssd varies between 0.01 (as seen here) and sometimes increases to 0.039.
Additionally whilst futex is called 4 times in this output there are cases where it is only called twice and I am unsure as to why this is.

Thank you for your ongoing interest linuxdev.

Sometimes it is useful to see repetitive cycles; without repetition you won’t see outlying extreme cases (I love standard deviations! Such nice info…). You might want to look at summaries via “strace -c”. I haven’t used this in a while, but there are several variations which might tell part of the story. Having samples from non-frame-dropping commands and comparing to frame-dropping commands would be the most interesting. You’re not just interested in write times, as data itself changes things…and you need to know what’s different between good and bad variations of your command.

Did you try also maximising the EMC and the GPU clocks in addition to maximising the CPU clocks?

I have found a solution for my problem (though it may not necessarily apply to everyone) without maximizing the EMC or GPU clocks (as they are not in use)

The time it took to retrieve the image buffer from the camera was just under 1/30th of a second and the write time was just over 1/30th. This meant that whilst it managed to sneak all images to the hard drive before the next buffer arrived sometimes it would miss this (due to variations somewhere).

To resolve this I used a ring buffer of images. One thread reads the current image directly to the buffer (with the use of mutex locks) each time a new image was captured in the camera. The second thread reads the buffer whenever the current read index did not equal the write index (i.e. there were new frames to write) and would write to the SSD immediately. I still use the performance mode with all cores up and running at max capacity (just to be sure)

I have been able to record up to 48fps without frame skips with this method.

EMC is the memory controller.

But I would guess that it is already clocked quite high as you are doing some heavy work there. Might be worth to double check it though.