switching V4L2_MEMORY_USERPTR after each frame = visual artifact

Using the latest Jetpack 3.3, we see an odd artifact. When switching the V4L2Buffer buf.m.userptr after the frame has been dequeued, the screen flickers when the buffer is requeued to the driver.

I’d like to keep all incoming images from the V4L2 interface for a period of time (caching them to collect a bayer image set), ideally, without copying them.

One approach that seams plausible is simply re-assigning the V4L2 buffer buf.m.userptr after a V4L2 buffer has been dequeued, before re-queueing it as mentioned.

If we use the /tegra_multimedia_api/samples/v4l2cuda/capture.cpp example with the stock camera:

static const char *     dev_name        = "/dev/video0";
static io_method        io              = IO_METHOD_USERPTR;
static int              fd              = -1;
struct buffer *         buffers         = NULL;
static unsigned int     n_buffers       = 0;
static unsigned int     width           = 2592;
static unsigned int     height          = 1944;
static unsigned int     count           = 10;
static unsigned char *  cuda_out_buffer = NULL;
static bool             cuda_zero_copy = true;
static const char *     file_name       = "out.ppm";
static unsigned int     pixel_format    = V4L2_PIX_FMT_SBGGR10;
static unsigned int     field           = V4L2_FIELD_NONE;

and modify this function to demonstrate the behaviour:

static int
read_frame                      (void)
{
    struct v4l2_buffer buf;
    unsigned int i;

    switch (io) {
        case IO_METHOD_READ:
            ...
            break;

        case IO_METHOD_MMAP:
            ...
            break;

        case IO_METHOD_USERPTR:
            CLEAR (buf);

            buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
            buf.memory = V4L2_MEMORY_USERPTR;

            if (-1 == xioctl (fd, VIDIOC_DQBUF, &buf)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("VIDIOC_DQBUF");
                }
            }

            for (i = 0; i < n_buffers; ++i)
                if (buf.m.userptr == (unsigned long) buffers[i].start
                        && buf.length == buffers[i].length)
                    break;

            assert (i < n_buffers);

                /* START TEST CODE */
                //process_image ((void *) buf.m.userptr);

                    // intentionally not freed for test
                    cudaError err = cudaMallocManaged (&buffers[i].start, buffers[i].length, cudaMemAttachGlobal);
                    if( err != cudaSuccess) {
                            printf("cudaMallocManaged failed!\n");
                    }
                    err = cudaDeviceSynchronize();
                    if( err != cudaSuccess) {
                        printf("cudaDeviceSynchronize failed!\n");
                    }

                    buf.m.userptr = (unsigned long) buffers[i].start;
                    printf("new pointer for buffer %d = %p\n", i, buffers[i].start);
                /* END OF TEST CODE */

if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

            break;
    }

    return 1;
}

Interestingly, if we wait 1 frame before re-queueing, i.e.

usleep(33000);

if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

we don’t see this issue. It seems as though there is some kind of race condition or DMA activity still going on underneath after the frame is de-queued.

If we increase n_buffers to 10, the behavior is the same (I was curious).

The concern is whether we have a possible race condition accessing the data if we have a fast kernel that operates on the fresly de-queued image? If the buf.m.userptr is de-queued and in user-space, to me it seems like the buffer should be complete, and ready to access at will.

Here is a video (capturing 100 frames), sorry about the poor camera:
https://drive.google.com/open?id=1mfPsA3xZiA4DRxpoW5_coOy9KfVl-jM_

If we take this one step further and cache the image data for each buffer (keeping the CUDA pointers and assigning new ones before re-queueing the buffer), de-bayering them and writing them back to disk, we see torn frames or black rows in some of the images. Some image data is incomplete, but V4L2_BUF_FLAG_ERROR is not set. Some images are fine. If we add the aforementioned ~33 ms delay the image data is correct.

Any thoughts on why this happens?

Full code for stock camera to copy and paste for testing this:

/*
 * Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 */

/*
 *  V4L2 video capture example
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

#include <getopt.h>             /* getopt_long() */

#include <fcntl.h>              /* low-level i/o */
#include <unistd.h>
#include <errno.h>
#include <malloc.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <sys/ioctl.h>

#include <asm/types.h>          /* for videodev2.h */

#include <linux/videodev2.h>

#include <cuda_runtime.h>
#include "yuv2rgb.cuh"

#define CLEAR(x) memset (&(x), 0, sizeof (x))
#define ARRAY_SIZE(a)   (sizeof(a)/sizeof((a)[0]))

typedef enum {
    IO_METHOD_READ,
    IO_METHOD_MMAP,
    IO_METHOD_USERPTR,
} io_method;

struct buffer {
    void *                  start;
    size_t                  length;
};

static const char *     dev_name        = "/dev/video0";
static io_method        io              = IO_METHOD_USERPTR;
static int              fd              = -1;
struct buffer *         buffers         = NULL;
static unsigned int     n_buffers       = 0;
static unsigned int     width           = 2592;
static unsigned int     height          = 1944;
static unsigned int     count           = 10;
static unsigned char *  cuda_out_buffer = NULL;
static bool             cuda_zero_copy = true;
static const char *     file_name       = "out.ppm";
static unsigned int     pixel_format    = V4L2_PIX_FMT_SBGGR10;
static unsigned int     field           = V4L2_FIELD_NONE;

static void
errno_exit                      (const char *           s)
{
    fprintf (stderr, "%s error %d, %s\n",
            s, errno, strerror (errno));

    exit (EXIT_FAILURE);
}

static int
xioctl                          (int                    fd,
                                 int                    request,
                                 void *                 arg)
{
    int r;

    do r = ioctl (fd, request, arg);
    while (-1 == r && EINTR == errno);

    return r;
}

static void
process_image                   (void *           p)
{
    printf ("CUDA format conversion on frame %p\n", p);
    gpuConvertYUYVtoRGB ((unsigned char *) p, cuda_out_buffer, width, height);

    /* Save image. */
    if (count == 0) {
        FILE *fp = fopen (file_name, "wb");
        fprintf (fp, "P6\n%u %u\n255\n", width, height);
        fwrite (cuda_out_buffer, 1, width * height * 3, fp);
        fclose (fp);
    }
}

static int
read_frame                      (void)
{
    struct v4l2_buffer buf;
    unsigned int i;

    switch (io) {
        case IO_METHOD_READ:
            if (-1 == read (fd, buffers[0].start, buffers[0].length)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("read");
                }
            }

            process_image (buffers[0].start);

            break;

        case IO_METHOD_MMAP:
            CLEAR (buf);

            buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
            buf.memory = V4L2_MEMORY_MMAP;

            if (-1 == xioctl (fd, VIDIOC_DQBUF, &buf)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("VIDIOC_DQBUF");
                }
            }

            assert (buf.index < n_buffers);

            process_image (buffers[buf.index].start);

            if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

            break;

        case IO_METHOD_USERPTR:
            CLEAR (buf);

            buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
            buf.memory = V4L2_MEMORY_USERPTR;

            if (-1 == xioctl (fd, VIDIOC_DQBUF, &buf)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("VIDIOC_DQBUF");
                }
            }

            for (i = 0; i < n_buffers; ++i)
                if (buf.m.userptr == (unsigned long) buffers[i].start
                        && buf.length == buffers[i].length)
                    break;

            assert (i < n_buffers);

                /* START TEST CODE */
                //process_image ((void *) buf.m.userptr);

                    // intentionally not freed for test
                    cudaError err = cudaMallocManaged (&buffers[i].start, buffers[i].length, cudaMemAttachGlobal);
                    if( err != cudaSuccess) {
                            printf("cudaMallocManaged failed!\n");
                    }
                    err = cudaDeviceSynchronize();
                    if( err != cudaSuccess) {
                        printf("cudaDeviceSynchronize failed!\n");
                    }
                    buf.m.userptr = (unsigned long) buffers[i].start;
                    printf("new pointer for buffer %d = %p\n", i, buffers[i].start);
                /* END OF TEST CODE */

if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

            break;
    }

    return 1;
}

static void
mainloop                        (void)
{
    while (count-- > 0) {
        for (;;) {
            fd_set fds;
            struct timeval tv;
            int r;

            FD_ZERO (&fds);
            FD_SET (fd, &fds);

            /* Timeout. */
            tv.tv_sec = 2;
            tv.tv_usec = 0;

            r = select (fd + 1, &fds, NULL, NULL, &tv);

            if (-1 == r) {
                if (EINTR == errno)
                    continue;

                errno_exit ("select");
            }

            if (0 == r) {
                fprintf (stderr, "select timeout\n");
                exit (EXIT_FAILURE);
            }

            if (read_frame ())
                break;

            /* EAGAIN - continue select loop. */
        }
    }
}

static void
stop_capturing                  (void)
{
    enum v4l2_buf_type type;

    switch (io) {
        case IO_METHOD_READ:
            /* Nothing to do. */
            break;

        case IO_METHOD_MMAP:
        case IO_METHOD_USERPTR:
            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMOFF, &type))
                errno_exit ("VIDIOC_STREAMOFF");

            break;
    }
}

static void
start_capturing                 (void)
{
    unsigned int i;
    enum v4l2_buf_type type;

    switch (io) {
        case IO_METHOD_READ:
            /* Nothing to do. */
            break;

        case IO_METHOD_MMAP:
            for (i = 0; i < n_buffers; ++i) {
                struct v4l2_buffer buf;

                CLEAR (buf);

                buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
                buf.memory      = V4L2_MEMORY_MMAP;
                buf.index       = i;

                if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                    errno_exit ("VIDIOC_QBUF");
            }

            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMON, &type))
                errno_exit ("VIDIOC_STREAMON");

            break;

        case IO_METHOD_USERPTR:
            for (i = 0; i < n_buffers; ++i) {
                struct v4l2_buffer buf;

                CLEAR (buf);

                buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
                buf.memory      = V4L2_MEMORY_USERPTR;
                buf.index       = i;
                buf.m.userptr   = (unsigned long) buffers[i].start;
                buf.length      = buffers[i].length;

                if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                    errno_exit ("VIDIOC_QBUF");
            }

            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMON, &type))
                errno_exit ("VIDIOC_STREAMON");

            break;
    }
}

static void
uninit_device                   (void)
{
    unsigned int i;

    switch (io) {
        case IO_METHOD_READ:
            free (buffers[0].start);
            break;

        case IO_METHOD_MMAP:
            for (i = 0; i < n_buffers; ++i)
                if (-1 == munmap (buffers[i].start, buffers[i].length))
                    errno_exit ("munmap");
            break;

        case IO_METHOD_USERPTR:
            for (i = 0; i < n_buffers; ++i) {
                if (cuda_zero_copy) {
                    cudaFree (buffers[i].start);
                } else {
                    free (buffers[i].start);
                }
            }
            break;
    }

    free (buffers);

    if (cuda_zero_copy) {
        cudaFree (cuda_out_buffer);
    }
}

static void
init_read                       (unsigned int           buffer_size)
{
    buffers = (struct buffer *) calloc (1, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    buffers[0].length = buffer_size;
    buffers[0].start = malloc (buffer_size);

    if (!buffers[0].start) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }
}

static void
init_mmap                       (void)
{
    struct v4l2_requestbuffers req;

    CLEAR (req);

    req.count               = 4;
    req.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory              = V4L2_MEMORY_MMAP;

    if (-1 == xioctl (fd, VIDIOC_REQBUFS, &req)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s does not support "
                    "memory mapping\n", dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_REQBUFS");
        }
    }

    if (req.count < 2) {
        fprintf (stderr, "Insufficient buffer memory on %s\n",
                dev_name);
        exit (EXIT_FAILURE);
    }

    buffers = (struct buffer *) calloc (req.count, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    for (n_buffers = 0; n_buffers < req.count; ++n_buffers) {
        struct v4l2_buffer buf;

        CLEAR (buf);

        buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        buf.memory      = V4L2_MEMORY_MMAP;
        buf.index       = n_buffers;

        if (-1 == xioctl (fd, VIDIOC_QUERYBUF, &buf))
            errno_exit ("VIDIOC_QUERYBUF");

        buffers[n_buffers].length = buf.length;
        buffers[n_buffers].start =
            mmap (NULL /* start anywhere */,
                    buf.length,
                    PROT_READ | PROT_WRITE /* required */,
                    MAP_SHARED /* recommended */,
                    fd, buf.m.offset);

        if (MAP_FAILED == buffers[n_buffers].start)
            errno_exit ("mmap");
    }
}

static void
init_userp                      (unsigned int           buffer_size)
{
    struct v4l2_requestbuffers req;
    unsigned int page_size;

    page_size = getpagesize ();
    buffer_size = (buffer_size + page_size - 1) & ~(page_size - 1);

    CLEAR (req);

    req.count               = 4;
    req.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory              = V4L2_MEMORY_USERPTR;

    if (-1 == xioctl (fd, VIDIOC_REQBUFS, &req)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s does not support "
                    "user pointer i/o\n", dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_REQBUFS");
        }
    }

    buffers = (struct buffer *) calloc (4, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    for (n_buffers = 0; n_buffers < 4; ++n_buffers) {
        buffers[n_buffers].length = buffer_size;
        if (cuda_zero_copy) {
            cudaMallocManaged (&buffers[n_buffers].start, buffer_size, cudaMemAttachGlobal);
        } else {
            buffers[n_buffers].start = memalign (/* boundary */ page_size,
                    buffer_size);
        }

        if (!buffers[n_buffers].start) {
            fprintf (stderr, "Out of memory\n");
            exit (EXIT_FAILURE);
        }
    }
}

static void
init_device                     (void)
{
    struct v4l2_capability cap;
    struct v4l2_cropcap cropcap;
    struct v4l2_crop crop;
    struct v4l2_format fmt;
    unsigned int min;

    if (-1 == xioctl (fd, VIDIOC_QUERYCAP, &cap)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s is no V4L2 device\n",
                    dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_QUERYCAP");
        }
    }

    if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE)) {
        fprintf (stderr, "%s is no video capture device\n",
                dev_name);
        exit (EXIT_FAILURE);
    }

    switch (io) {
        case IO_METHOD_READ:
            if (!(cap.capabilities & V4L2_CAP_READWRITE)) {
                fprintf (stderr, "%s does not support read i/o\n",
                        dev_name);
                exit (EXIT_FAILURE);
            }

            break;

        case IO_METHOD_MMAP:
        case IO_METHOD_USERPTR:
            if (!(cap.capabilities & V4L2_CAP_STREAMING)) {
                fprintf (stderr, "%s does not support streaming i/o\n",
                        dev_name);
                exit (EXIT_FAILURE);
            }

            break;
    }

/* Select video input, video standard and tune here. */

CLEAR (cropcap);

    cropcap.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

    if (0 == xioctl (fd, VIDIOC_CROPCAP, &cropcap)) {
        crop.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        crop.c = cropcap.defrect; /* reset to default */

        if (-1 == xioctl (fd, VIDIOC_S_CROP, &crop)) {
            switch (errno) {
                case EINVAL:
                    /* Cropping not supported. */
                    break;
                default:
                    /* Errors ignored. */
                    break;
            }
        }
    } else {
        /* Errors ignored. */
    }

CLEAR (fmt);

    fmt.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    fmt.fmt.pix.width       = width;
    fmt.fmt.pix.height      = height;
    fmt.fmt.pix.pixelformat = pixel_format;
    fmt.fmt.pix.field       = field;

    if (-1 == xioctl (fd, VIDIOC_S_FMT, &fmt))
        errno_exit ("VIDIOC_S_FMT");

    /* Note VIDIOC_S_FMT may change width and height. */

    /* Buggy driver paranoia. */
    min = fmt.fmt.pix.width * 2;
    if (fmt.fmt.pix.bytesperline < min)
        fmt.fmt.pix.bytesperline = min;
    min = fmt.fmt.pix.bytesperline * fmt.fmt.pix.height;
    if (fmt.fmt.pix.sizeimage < min)
        fmt.fmt.pix.sizeimage = min;

    switch (io) {
        case IO_METHOD_READ:
            init_read (fmt.fmt.pix.sizeimage);
            break;

        case IO_METHOD_MMAP:
            init_mmap ();
            break;

        case IO_METHOD_USERPTR:
            init_userp (fmt.fmt.pix.sizeimage);
            break;
    }
}

static void
close_device                    (void)
{
    if (-1 == close (fd))
        errno_exit ("close");

    fd = -1;
}

static void
open_device                     (void)
{
    struct stat st;

    if (-1 == stat (dev_name, &st)) {
        fprintf (stderr, "Cannot identify '%s': %d, %s\n",
                dev_name, errno, strerror (errno));
        exit (EXIT_FAILURE);
    }

    if (!S_ISCHR (st.st_mode)) {
        fprintf (stderr, "%s is no device\n", dev_name);
        exit (EXIT_FAILURE);
    }

    fd = open (dev_name, O_RDWR /* required */ | O_NONBLOCK, 0);

    if (-1 == fd) {
        fprintf (stderr, "Cannot open '%s': %d, %s\n",
                dev_name, errno, strerror (errno));
        exit (EXIT_FAILURE);
    }
}

static void
init_cuda                       (void)
{
    /* Check unified memory support. */
    if (cuda_zero_copy) {
        cudaDeviceProp devProp;
        cudaGetDeviceProperties (&devProp, 0);
        if (!devProp.managedMemory) {
            printf ("CUDA device does not support managed memory.\n");
            cuda_zero_copy = false;
        }
    }

    /* Allocate output buffer. */
    size_t size = width * height * 3;
    if (cuda_zero_copy) {
        cudaMallocManaged (&cuda_out_buffer, size, cudaMemAttachGlobal);
    } else {
        cuda_out_buffer = (unsigned char *) malloc (size);
    }

    cudaDeviceSynchronize ();
}

static void
usage                           (FILE *                 fp,
                                 int                    argc,
                                 char **                argv)
{
    fprintf (fp,
            "Usage: %s [options]\n\n"
            "Options:\n"
            "-c | --count N       Frame count (default: %u)\n"
            "-d | --device name   Video device name (default: %s)\n"
            "-f | --format        Capture input pixel format (default: UYVY)\n"
            "-h | --help          Print this message\n"
            "-m | --mmap          Use memory mapped buffers\n"
            "-o | --output        Output file name (default: %s)\n"
            "-s | --size WxH      Frame size (default: %ux%u)\n"
            "-u | --userp         Use application allocated buffers\n"
            "-z | --zcopy         Use zero copy CUDA memory\n"
            "Experimental options:\n"
            "-r | --read          Use read() calls\n"
            "-F | --field         Capture field (default: INTERLACED)\n"
            "",
            argv[0], count, dev_name, file_name, width, height);
}

static const char short_options [] = "c:d:f:F:hmo:rs:uz";

static const struct option
long_options [] = {
    { "count",      required_argument,      NULL,           'c' },
    { "device",     required_argument,      NULL,           'd' },
    { "format",     required_argument,      NULL,           'f' },
    { "field",      required_argument,      NULL,           'F' },
    { "help",       no_argument,            NULL,           'h' },
    { "mmap",       no_argument,            NULL,           'm' },
    { "output",     required_argument,      NULL,           'o' },
    { "read",       no_argument,            NULL,           'r' },
    { "size",       required_argument,      NULL,           's' },
    { "userp",      no_argument,            NULL,           'u' },
    { "zcopy",      no_argument,            NULL,           'z' },
    { 0, 0, 0, 0 }
};

static struct {
    const char *name;
    unsigned int fourcc;
} pixel_formats[] = {
    { "RGB332", V4L2_PIX_FMT_RGB332 },
    { "RGB555", V4L2_PIX_FMT_RGB555 },
    { "RGB565", V4L2_PIX_FMT_RGB565 },
    { "RGB555X", V4L2_PIX_FMT_RGB555X },
    { "RGB565X", V4L2_PIX_FMT_RGB565X },
    { "BGR24", V4L2_PIX_FMT_BGR24 },
    { "RGB24", V4L2_PIX_FMT_RGB24 },
    { "BGR32", V4L2_PIX_FMT_BGR32 },
    { "RGB32", V4L2_PIX_FMT_RGB32 },
    { "Y8", V4L2_PIX_FMT_GREY },
    { "Y10", V4L2_PIX_FMT_Y10 },
    { "Y12", V4L2_PIX_FMT_Y12 },
    { "Y16", V4L2_PIX_FMT_Y16 },
    { "UYVY", V4L2_PIX_FMT_UYVY },
    { "VYUY", V4L2_PIX_FMT_VYUY },
    { "YUYV", V4L2_PIX_FMT_YUYV },
    { "YVYU", V4L2_PIX_FMT_YVYU },
    { "NV12", V4L2_PIX_FMT_NV12 },
    { "NV21", V4L2_PIX_FMT_NV21 },
    { "NV16", V4L2_PIX_FMT_NV16 },
    { "NV61", V4L2_PIX_FMT_NV61 },
    { "NV24", V4L2_PIX_FMT_NV24 },
    { "NV42", V4L2_PIX_FMT_NV42 },
    { "SBGGR8", V4L2_PIX_FMT_SBGGR8 },
    { "SGBRG8", V4L2_PIX_FMT_SGBRG8 },
    { "SGRBG8", V4L2_PIX_FMT_SGRBG8 },
    { "SRGGB8", V4L2_PIX_FMT_SRGGB8 },
    { "SBGGR10_DPCM8", V4L2_PIX_FMT_SBGGR10DPCM8 },
    { "SGBRG10_DPCM8", V4L2_PIX_FMT_SGBRG10DPCM8 },
    { "SGRBG10_DPCM8", V4L2_PIX_FMT_SGRBG10DPCM8 },
    { "SRGGB10_DPCM8", V4L2_PIX_FMT_SRGGB10DPCM8 },
    { "SBGGR10", V4L2_PIX_FMT_SBGGR10 },
    { "SGBRG10", V4L2_PIX_FMT_SGBRG10 },
    { "SGRBG10", V4L2_PIX_FMT_SGRBG10 },
    { "SRGGB10", V4L2_PIX_FMT_SRGGB10 },
    { "SBGGR12", V4L2_PIX_FMT_SBGGR12 },
    { "SGBRG12", V4L2_PIX_FMT_SGBRG12 },
    { "SGRBG12", V4L2_PIX_FMT_SGRBG12 },
    { "SRGGB12", V4L2_PIX_FMT_SRGGB12 },
    { "DV", V4L2_PIX_FMT_DV },
    { "MJPEG", V4L2_PIX_FMT_MJPEG },
    { "MPEG", V4L2_PIX_FMT_MPEG },
};

static unsigned int v4l2_format_code(const char *name)
{
    unsigned int i;

    for (i = 0; i < ARRAY_SIZE(pixel_formats); ++i) {
        if (strcasecmp(pixel_formats[i].name, name) == 0)
            return pixel_formats[i].fourcc;
    }

    return 0;
}

static struct {
    const char *name;
    unsigned int field;
} fields[] = {
    { "ANY", V4L2_FIELD_ANY },
    { "NONE", V4L2_FIELD_NONE },
    { "TOP", V4L2_FIELD_TOP },
    { "BOTTOM", V4L2_FIELD_BOTTOM },
    { "INTERLACED", V4L2_FIELD_INTERLACED },
    { "SEQ_TB", V4L2_FIELD_SEQ_TB },
    { "SEQ_BT", V4L2_FIELD_SEQ_BT },
    { "ALTERNATE", V4L2_FIELD_ALTERNATE },
    { "INTERLACED_TB", V4L2_FIELD_INTERLACED_TB },
    { "INTERLACED_BT", V4L2_FIELD_INTERLACED_BT },
};

static unsigned int v4l2_field_code(const char *name)
{
    unsigned int i;

    for (i = 0; i < ARRAY_SIZE(fields); ++i) {
        if (strcasecmp(fields[i].name, name) == 0)
            return fields[i].field;
    }

    return -1;
}

int
main                            (int                    argc,
                                 char **                argv)
{
    for (;;) {
        int index;
        int c;

        c = getopt_long (argc, argv,
                short_options, long_options,
                &index);

        if (-1 == c)
            break;

        switch (c) {
            case 0: /* getopt_long() flag */
                break;

            case 'c':
                count = atoi (optarg);
                break;

            case 'd':
                dev_name = optarg;
                break;

            case 'f':
                pixel_format = v4l2_format_code(optarg);
                if (pixel_format == 0) {
                    printf("Unsupported video format '%s'\n", optarg);
                    pixel_format = V4L2_PIX_FMT_UYVY;
                }
                break;

            case 'F':
                field = v4l2_field_code(optarg);
                if ((int)field < 0) {
                    printf("Unsupported field '%s'\n", optarg);
                    field = V4L2_FIELD_INTERLACED;
                }
                break;

            case 'h':
                usage (stdout, argc, argv);
                exit (EXIT_SUCCESS);

            case 'm':
                io = IO_METHOD_MMAP;
                break;

            case 'o':
                file_name = optarg;
                break;

            case 'r':
                io = IO_METHOD_READ;
                break;

            case 's':
                width = atoi (strtok (optarg, "x"));
                height = atoi (strtok (NULL, "x"));
                break;

            case 'u':
                io = IO_METHOD_USERPTR;
                break;

            case 'z':
                cuda_zero_copy = true;
                break;

            default:
                usage (stderr, argc, argv);
                exit (EXIT_FAILURE);
        }
    }

    open_device ();

    init_device ();

    init_cuda ();

    start_capturing ();

    mainloop ();

    stop_capturing ();

    uninit_device ();

    close_device ();

    exit (EXIT_SUCCESS);

    return 0;
}

Hi,
We would like to reproduce the issue. Can it be reproduced with default camera board(ov5693)?

Hi Dane,

Yes, this happens on both the stock camera and the IMX378

Hi,
Could you try

buffers[i].start = memalign(getpagesize (), buffers[i].length);
buf.m.userptr = (unsigned long) buffers[i].start;
printf("new pointer for buffer %d = %p\n", i, buffers[i].start);

Would like to confirm it also happens to memalign(). Not specific to cudaMallocManaged().

Thanks for the suggestion Dane.

I gave this a try and the result is the same.

Switching to

buffers[i].start = memalign(getpagesize (), buffers[i].length);

for both the initial allocation and the re-allocation after de-queuing.

Additionally, I added some opencv convenience code, save_image(), which debayers and saves the images to disk as .bmp after the mainloop() is complete (not during since this seems time dependent). Lastly, a 4L2_BUF_FLAG_ERROR check as well.

/*
 * Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 */

/*
 *  V4L2 video capture example
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

#include <getopt.h>             /* getopt_long() */

#include <fcntl.h>              /* low-level i/o */
#include <unistd.h>
#include <errno.h>
#include <malloc.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <sys/ioctl.h>

#include <asm/types.h>          /* for videodev2.h */

#include <linux/videodev2.h>

#include <cuda_runtime.h>
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>

#define CLEAR(x) memset (&(x), 0, sizeof (x))
#define ARRAY_SIZE(a)   (sizeof(a)/sizeof((a)[0]))

typedef enum {
    IO_METHOD_READ,
    IO_METHOD_MMAP,
    IO_METHOD_USERPTR,
} io_method;

struct buffer {
    void *                  start;
    size_t                  length;
};

static const char *     dev_name        = "/dev/video0";
static io_method        io              = IO_METHOD_USERPTR;
static int              fd              = -1;
struct buffer *         buffers         = NULL;
static unsigned int     n_buffers       = 0;
static unsigned int     width           = 3840;
static unsigned int     height          = 2160;
static const unsigned int count          = 10;
static unsigned char *  cuda_out_buffer = NULL;
static bool             cuda_zero_copy = true;
static const char *     file_name       = "out.ppm";
static unsigned int     pixel_format    = V4L2_PIX_FMT_SBGGR10;
static unsigned int     field           = V4L2_FIELD_NONE;

int bufsToSaveIndex = 0;
void** bufsToSave;

void save_image(int i, void* ptr)
{
	printf("saving image: %d = %p\n", i, ptr);
	cv::Mat inputMat_16UC1 = cv::Mat( height, width, CV_16UC1, ptr);
	cv::Mat outputMat_16UC3 = cv::Mat( height, width, CV_16UC3 );
	// convert RG10 to RGB 16-bit
	cvtColor( inputMat_16UC1, outputMat_16UC3, cv::COLOR_BayerRG2RGB);

	// scale it to 8-bit
	cv::Mat outputMat_8UC3 = cv::Mat( height, width, CV_8UC3);
	outputMat_16UC3.convertTo( outputMat_8UC3, CV_8UC3, 0.0625 );

	std::string path = std::string("/home/nvidia/image_") + std::to_string(i) + std::string(".bmp");
	cv::imwrite(path, outputMat_8UC3);
}

static void
errno_exit                      (const char *           s)
{
    fprintf (stderr, "%s error %d, %s\n",
            s, errno, strerror (errno));

    exit (EXIT_FAILURE);
}

static int
xioctl                          (int                    fd,
                                 int                    request,
                                 void *                 arg)
{
    int r;

    do r = ioctl (fd, request, arg);
    while (-1 == r && EINTR == errno);

    return r;
}

static void
process_image                   (void *           p)
{
    printf ("CUDA format conversion on frame %p\n", p);
#if 0
    gpuConvertYUYVtoRGB ((unsigned char *) p, cuda_out_buffer, width, height);

    /* Save image. */
    if (count == 0) {
        FILE *fp = fopen (file_name, "wb");
        fprintf (fp, "P6\n%u %u\n255\n", width, height);
        fwrite (cuda_out_buffer, 1, width * height * 3, fp);
        fclose (fp);
    }
#endif
}

static int
read_frame                      (void)
{
    struct v4l2_buffer buf;
    unsigned int i;

    switch (io) {
        case IO_METHOD_READ:
            if (-1 == read (fd, buffers[0].start, buffers[0].length)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("read");
                }
            }

            process_image (buffers[0].start);

            break;

        case IO_METHOD_MMAP:
            CLEAR (buf);

            buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
            buf.memory = V4L2_MEMORY_MMAP;

            if (-1 == xioctl (fd, VIDIOC_DQBUF, &buf)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("VIDIOC_DQBUF");
                }
            }

            assert (buf.index < n_buffers);

            process_image (buffers[buf.index].start);

            if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

            break;

        case IO_METHOD_USERPTR:
            CLEAR (buf);

            buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
            buf.memory = V4L2_MEMORY_USERPTR;

            if (-1 == xioctl (fd, VIDIOC_DQBUF, &buf)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("VIDIOC_DQBUF");
                }
            }

            for (i = 0; i < n_buffers; ++i)
                if (buf.m.userptr == (unsigned long) buffers[i].start
                        && buf.length == buffers[i].length)
                    break;

            assert (i < n_buffers);

                /* START TEST CODE */
            		//process_image ((void *) buf.m.userptr);

					// we will write this buffer to disk later, save the pointer
					if( buf.flags & V4L2_BUF_FLAG_ERROR ){
						printf("DATA ERROR for buffer: %d\n", i);
					}
					bufsToSave[bufsToSaveIndex++] = buffers[i].start;

                    // intentionally not freed for test
#if 0
                    cudaError err = cudaMallocManaged (&buffers[i].start, buffers[i].length, cudaMemAttachGlobal);
                    if( err != cudaSuccess) {
                            printf("cudaMallocManaged failed!\n");
                    }
                    err = cudaDeviceSynchronize();
                    if( err != cudaSuccess) {
                        printf("cudaDeviceSynchronize failed!\n");
                    }
#endif
                    buffers[i].start = memalign(getpagesize (), buffers[i].length);
                    buf.m.userptr = (unsigned long) buffers[i].start;
                    printf("new pointer for buffer %d = %p\n", i, buffers[i].start);
                /* END OF TEST CODE */

if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

            break;
    }

    return 1;
}

static void
mainloop                        (void)
{
	// don't change count
	int cnt = count;
    while (cnt-- > 0) {
        for (;;) {
            fd_set fds;
            struct timeval tv;
            int r;

            FD_ZERO (&fds);
            FD_SET (fd, &fds);

            /* Timeout. */
            tv.tv_sec = 2;
            tv.tv_usec = 0;

            r = select (fd + 1, &fds, NULL, NULL, &tv);

            if (-1 == r) {
                if (EINTR == errno)
                    continue;

                errno_exit ("select");
            }

            if (0 == r) {
                fprintf (stderr, "select timeout\n");
                exit (EXIT_FAILURE);
            }

            if (read_frame ())
                break;

            /* EAGAIN - continue select loop. */
        }
    }
}

static void
stop_capturing                  (void)
{
    enum v4l2_buf_type type;

    switch (io) {
        case IO_METHOD_READ:
            /* Nothing to do. */
            break;

        case IO_METHOD_MMAP:
        case IO_METHOD_USERPTR:
            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMOFF, &type))
                errno_exit ("VIDIOC_STREAMOFF");

            break;
    }
}

static void
start_capturing                 (void)
{
    unsigned int i;
    enum v4l2_buf_type type;

    switch (io) {
        case IO_METHOD_READ:
            /* Nothing to do. */
            break;

        case IO_METHOD_MMAP:
            for (i = 0; i < n_buffers; ++i) {
                struct v4l2_buffer buf;

                CLEAR (buf);

                buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
                buf.memory      = V4L2_MEMORY_MMAP;
                buf.index       = i;

                if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                    errno_exit ("VIDIOC_QBUF");
            }

            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMON, &type))
                errno_exit ("VIDIOC_STREAMON");

            break;

        case IO_METHOD_USERPTR:
            for (i = 0; i < n_buffers; ++i) {
                struct v4l2_buffer buf;

                CLEAR (buf);

                buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
                buf.memory      = V4L2_MEMORY_USERPTR;
                buf.index       = i;
                buf.m.userptr   = (unsigned long) buffers[i].start;
                buf.length      = buffers[i].length;

                if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                    errno_exit ("VIDIOC_QBUF");
            }

            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMON, &type))
                errno_exit ("VIDIOC_STREAMON");

            break;
    }
}

static void
uninit_device                   (void)
{
    unsigned int i;

    switch (io) {
        case IO_METHOD_READ:
            free (buffers[0].start);
            break;

        case IO_METHOD_MMAP:
            for (i = 0; i < n_buffers; ++i)
                if (-1 == munmap (buffers[i].start, buffers[i].length))
                    errno_exit ("munmap");
            break;

        case IO_METHOD_USERPTR:
            for (i = 0; i < n_buffers; ++i) {
                if (cuda_zero_copy) {
                    cudaFree (buffers[i].start);
                } else {
                    free (buffers[i].start);
                }
            }
            break;
    }

    free (buffers);

    if (cuda_zero_copy) {
        cudaFree (cuda_out_buffer);
    }
}

static void
init_read                       (unsigned int           buffer_size)
{
    buffers = (struct buffer *) calloc (1, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    buffers[0].length = buffer_size;
    buffers[0].start = malloc (buffer_size);

    if (!buffers[0].start) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }
}

static void
init_mmap                       (void)
{
    struct v4l2_requestbuffers req;

    CLEAR (req);

    req.count               = 4;
    req.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory              = V4L2_MEMORY_MMAP;

    if (-1 == xioctl (fd, VIDIOC_REQBUFS, &req)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s does not support "
                    "memory mapping\n", dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_REQBUFS");
        }
    }

    if (req.count < 2) {
        fprintf (stderr, "Insufficient buffer memory on %s\n",
                dev_name);
        exit (EXIT_FAILURE);
    }

    buffers = (struct buffer *) calloc (req.count, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    for (n_buffers = 0; n_buffers < req.count; ++n_buffers) {
        struct v4l2_buffer buf;

        CLEAR (buf);

        buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        buf.memory      = V4L2_MEMORY_MMAP;
        buf.index       = n_buffers;

        if (-1 == xioctl (fd, VIDIOC_QUERYBUF, &buf))
            errno_exit ("VIDIOC_QUERYBUF");

        buffers[n_buffers].length = buf.length;
        buffers[n_buffers].start =
            mmap (NULL /* start anywhere */,
                    buf.length,
                    PROT_READ | PROT_WRITE /* required */,
                    MAP_SHARED /* recommended */,
                    fd, buf.m.offset);

        if (MAP_FAILED == buffers[n_buffers].start)
            errno_exit ("mmap");
    }
}

static void
init_userp                      (unsigned int           buffer_size)
{
    struct v4l2_requestbuffers req;
    unsigned int page_size;

    page_size = getpagesize ();
    buffer_size = (buffer_size + page_size - 1) & ~(page_size - 1);

    CLEAR (req);

    req.count               = 4;
    req.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory              = V4L2_MEMORY_USERPTR;

    if (-1 == xioctl (fd, VIDIOC_REQBUFS, &req)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s does not support "
                    "user pointer i/o\n", dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_REQBUFS");
        }
    }

    buffers = (struct buffer *) calloc (4, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    for (n_buffers = 0; n_buffers < 4; ++n_buffers) {
        buffers[n_buffers].length = buffer_size;
        if (cuda_zero_copy) {
buffers[n_buffers].start = memalign(getpagesize (), buffers[n_buffers].length);
            //cudaMallocManaged (&buffers[n_buffers].start, buffer_size, cudaMemAttachGlobal);
        } else {
            buffers[n_buffers].start = memalign (/* boundary */ page_size,
                    buffer_size);
        }

        if (!buffers[n_buffers].start) {
            fprintf (stderr, "Out of memory\n");
            exit (EXIT_FAILURE);
        }
    }
}

static void
init_device                     (void)
{
    struct v4l2_capability cap;
    struct v4l2_cropcap cropcap;
    struct v4l2_crop crop;
    struct v4l2_format fmt;
    unsigned int min;

    if (-1 == xioctl (fd, VIDIOC_QUERYCAP, &cap)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s is no V4L2 device\n",
                    dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_QUERYCAP");
        }
    }

    if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE)) {
        fprintf (stderr, "%s is no video capture device\n",
                dev_name);
        exit (EXIT_FAILURE);
    }

    switch (io) {
        case IO_METHOD_READ:
            if (!(cap.capabilities & V4L2_CAP_READWRITE)) {
                fprintf (stderr, "%s does not support read i/o\n",
                        dev_name);
                exit (EXIT_FAILURE);
            }

            break;

        case IO_METHOD_MMAP:
        case IO_METHOD_USERPTR:
            if (!(cap.capabilities & V4L2_CAP_STREAMING)) {
                fprintf (stderr, "%s does not support streaming i/o\n",
                        dev_name);
                exit (EXIT_FAILURE);
            }

            break;
    }

/* Select video input, video standard and tune here. */

CLEAR (cropcap);

    cropcap.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

    if (0 == xioctl (fd, VIDIOC_CROPCAP, &cropcap)) {
        crop.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        crop.c = cropcap.defrect; /* reset to default */

        if (-1 == xioctl (fd, VIDIOC_S_CROP, &crop)) {
            switch (errno) {
                case EINVAL:
                    /* Cropping not supported. */
                    break;
                default:
                    /* Errors ignored. */
                    break;
            }
        }
    } else {
        /* Errors ignored. */
    }

CLEAR (fmt);

    fmt.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    fmt.fmt.pix.width       = width;
    fmt.fmt.pix.height      = height;
    fmt.fmt.pix.pixelformat = pixel_format;
    fmt.fmt.pix.field       = field;

    if (-1 == xioctl (fd, VIDIOC_S_FMT, &fmt))
        errno_exit ("VIDIOC_S_FMT");

    /* Note VIDIOC_S_FMT may change width and height. */

    /* Buggy driver paranoia. */
    min = fmt.fmt.pix.width * 2;
    if (fmt.fmt.pix.bytesperline < min)
        fmt.fmt.pix.bytesperline = min;
    min = fmt.fmt.pix.bytesperline * fmt.fmt.pix.height;
    if (fmt.fmt.pix.sizeimage < min)
        fmt.fmt.pix.sizeimage = min;

    switch (io) {
        case IO_METHOD_READ:
            init_read (fmt.fmt.pix.sizeimage);
            break;

        case IO_METHOD_MMAP:
            init_mmap ();
            break;

        case IO_METHOD_USERPTR:
            init_userp (fmt.fmt.pix.sizeimage);
            break;
    }
}

static void
close_device                    (void)
{
    if (-1 == close (fd))
        errno_exit ("close");

    fd = -1;
}

static void
open_device                     (void)
{
    struct stat st;

    if (-1 == stat (dev_name, &st)) {
        fprintf (stderr, "Cannot identify '%s': %d, %s\n",
                dev_name, errno, strerror (errno));
        exit (EXIT_FAILURE);
    }

    if (!S_ISCHR (st.st_mode)) {
        fprintf (stderr, "%s is no device\n", dev_name);
        exit (EXIT_FAILURE);
    }

    fd = open (dev_name, O_RDWR /* required */ | O_NONBLOCK, 0);

    if (-1 == fd) {
        fprintf (stderr, "Cannot open '%s': %d, %s\n",
                dev_name, errno, strerror (errno));
        exit (EXIT_FAILURE);
    }
}

static void
init_cuda                       (void)
{
    /* Check unified memory support. */
    if (cuda_zero_copy) {
        cudaDeviceProp devProp;
        cudaGetDeviceProperties (&devProp, 0);
        if (!devProp.managedMemory) {
            printf ("CUDA device does not support managed memory.\n");
            cuda_zero_copy = false;
        }
    }

    /* Allocate output buffer. */
    size_t size = width * height * 3;
    if (cuda_zero_copy) {
        cudaMallocManaged (&cuda_out_buffer, size, cudaMemAttachGlobal);
    } else {
        cuda_out_buffer = (unsigned char *) malloc (size);
    }

    cudaDeviceSynchronize ();
}

static void
usage                           (FILE *                 fp,
                                 int                    argc,
                                 char **                argv)
{
    fprintf (fp,
            "Usage: %s [options]\n\n"
            "Options:\n"
            "-c | --count N       Frame count (default: %u)\n"
            "-d | --device name   Video device name (default: %s)\n"
            "-f | --format        Capture input pixel format (default: UYVY)\n"
            "-h | --help          Print this message\n"
            "-m | --mmap          Use memory mapped buffers\n"
            "-o | --output        Output file name (default: %s)\n"
            "-s | --size WxH      Frame size (default: %ux%u)\n"
            "-u | --userp         Use application allocated buffers\n"
            "-z | --zcopy         Use zero copy CUDA memory\n"
            "Experimental options:\n"
            "-r | --read          Use read() calls\n"
            "-F | --field         Capture field (default: INTERLACED)\n"
            "",
            argv[0], count, dev_name, file_name, width, height);
}

static const char short_options [] = "c:d:f:F:hmo:rs:uz";

static const struct option
long_options [] = {
    { "count",      required_argument,      NULL,           'c' },
    { "device",     required_argument,      NULL,           'd' },
    { "format",     required_argument,      NULL,           'f' },
    { "field",      required_argument,      NULL,           'F' },
    { "help",       no_argument,            NULL,           'h' },
    { "mmap",       no_argument,            NULL,           'm' },
    { "output",     required_argument,      NULL,           'o' },
    { "read",       no_argument,            NULL,           'r' },
    { "size",       required_argument,      NULL,           's' },
    { "userp",      no_argument,            NULL,           'u' },
    { "zcopy",      no_argument,            NULL,           'z' },
    { 0, 0, 0, 0 }
};

static struct {
    const char *name;
    unsigned int fourcc;
} pixel_formats[] = {
    { "RGB332", V4L2_PIX_FMT_RGB332 },
    { "RGB555", V4L2_PIX_FMT_RGB555 },
    { "RGB565", V4L2_PIX_FMT_RGB565 },
    { "RGB555X", V4L2_PIX_FMT_RGB555X },
    { "RGB565X", V4L2_PIX_FMT_RGB565X },
    { "BGR24", V4L2_PIX_FMT_BGR24 },
    { "RGB24", V4L2_PIX_FMT_RGB24 },
    { "BGR32", V4L2_PIX_FMT_BGR32 },
    { "RGB32", V4L2_PIX_FMT_RGB32 },
    { "Y8", V4L2_PIX_FMT_GREY },
    { "Y10", V4L2_PIX_FMT_Y10 },
    { "Y12", V4L2_PIX_FMT_Y12 },
    { "Y16", V4L2_PIX_FMT_Y16 },
    { "UYVY", V4L2_PIX_FMT_UYVY },
    { "VYUY", V4L2_PIX_FMT_VYUY },
    { "YUYV", V4L2_PIX_FMT_YUYV },
    { "YVYU", V4L2_PIX_FMT_YVYU },
    { "NV12", V4L2_PIX_FMT_NV12 },
    { "NV21", V4L2_PIX_FMT_NV21 },
    { "NV16", V4L2_PIX_FMT_NV16 },
    { "NV61", V4L2_PIX_FMT_NV61 },
    { "NV24", V4L2_PIX_FMT_NV24 },
    { "NV42", V4L2_PIX_FMT_NV42 },
    { "SBGGR8", V4L2_PIX_FMT_SBGGR8 },
    { "SGBRG8", V4L2_PIX_FMT_SGBRG8 },
    { "SGRBG8", V4L2_PIX_FMT_SGRBG8 },
    { "SRGGB8", V4L2_PIX_FMT_SRGGB8 },
    { "SBGGR10_DPCM8", V4L2_PIX_FMT_SBGGR10DPCM8 },
    { "SGBRG10_DPCM8", V4L2_PIX_FMT_SGBRG10DPCM8 },
    { "SGRBG10_DPCM8", V4L2_PIX_FMT_SGRBG10DPCM8 },
    { "SRGGB10_DPCM8", V4L2_PIX_FMT_SRGGB10DPCM8 },
    { "SBGGR10", V4L2_PIX_FMT_SBGGR10 },
    { "SGBRG10", V4L2_PIX_FMT_SGBRG10 },
    { "SGRBG10", V4L2_PIX_FMT_SGRBG10 },
    { "SRGGB10", V4L2_PIX_FMT_SRGGB10 },
    { "SBGGR12", V4L2_PIX_FMT_SBGGR12 },
    { "SGBRG12", V4L2_PIX_FMT_SGBRG12 },
    { "SGRBG12", V4L2_PIX_FMT_SGRBG12 },
    { "SRGGB12", V4L2_PIX_FMT_SRGGB12 },
    { "DV", V4L2_PIX_FMT_DV },
    { "MJPEG", V4L2_PIX_FMT_MJPEG },
    { "MPEG", V4L2_PIX_FMT_MPEG },
};

static unsigned int v4l2_format_code(const char *name)
{
    unsigned int i;

    for (i = 0; i < ARRAY_SIZE(pixel_formats); ++i) {
        if (strcasecmp(pixel_formats[i].name, name) == 0)
            return pixel_formats[i].fourcc;
    }

    return 0;
}

static struct {
    const char *name;
    unsigned int field;
} fields[] = {
    { "ANY", V4L2_FIELD_ANY },
    { "NONE", V4L2_FIELD_NONE },
    { "TOP", V4L2_FIELD_TOP },
    { "BOTTOM", V4L2_FIELD_BOTTOM },
    { "INTERLACED", V4L2_FIELD_INTERLACED },
    { "SEQ_TB", V4L2_FIELD_SEQ_TB },
    { "SEQ_BT", V4L2_FIELD_SEQ_BT },
    { "ALTERNATE", V4L2_FIELD_ALTERNATE },
    { "INTERLACED_TB", V4L2_FIELD_INTERLACED_TB },
    { "INTERLACED_BT", V4L2_FIELD_INTERLACED_BT },
};

static unsigned int v4l2_field_code(const char *name)
{
    unsigned int i;

    for (i = 0; i < ARRAY_SIZE(fields); ++i) {
        if (strcasecmp(fields[i].name, name) == 0)
            return fields[i].field;
    }

    return -1;
}

int
main                            (int                    argc,
                                 char **                argv)
{
    for (;;) {
        int index;
        int c;

        c = getopt_long (argc, argv,
                short_options, long_options,
                &index);

        if (-1 == c)
            break;

        switch (c) {
            case 0: /* getopt_long() flag */
                break;

            case 'c':
                //count = atoi (optarg);
                break;

            case 'd':
                dev_name = optarg;
                break;

            case 'f':
                pixel_format = v4l2_format_code(optarg);
                if (pixel_format == 0) {
                    printf("Unsupported video format '%s'\n", optarg);
                    pixel_format = V4L2_PIX_FMT_UYVY;
                }
                break;

            case 'F':
                field = v4l2_field_code(optarg);
                if ((int)field < 0) {
                    printf("Unsupported field '%s'\n", optarg);
                    field = V4L2_FIELD_INTERLACED;
                }
                break;

            case 'h':
                usage (stdout, argc, argv);
                exit (EXIT_SUCCESS);

            case 'm':
                io = IO_METHOD_MMAP;
                break;

            case 'o':
                file_name = optarg;
                break;

            case 'r':
                io = IO_METHOD_READ;
                break;

            case 's':
                width = atoi (strtok (optarg, "x"));
                height = atoi (strtok (NULL, "x"));
                break;

            case 'u':
                io = IO_METHOD_USERPTR;
                break;

            case 'z':
                cuda_zero_copy = true;
                break;

            default:
                usage (stderr, argc, argv);
                exit (EXIT_FAILURE);
        }
    }

    bufsToSave = (void**)malloc(count * sizeof(void*));

    open_device ();

    init_device ();

    init_cuda ();

    start_capturing ();

    mainloop ();

    for(int j = 0; j < count; ++j) {
    	save_image(j, bufsToSave[j]);
    }
    free(bufsToSave);

    stop_capturing ();

    uninit_device ();

    close_device ();

    exit (EXIT_SUCCESS);

    return 0;
}

and the modified MakeFile for the opencv libs (and removing the nvcc stuff):

include ../Rules.mk

APP := capture-cuda

SRCS := \
        capture.cpp

ALL_CPPFLAGS := $(addprefix -Xcompiler ,$(filter-out -std=c++11, $(CPPFLAGS)))

all: $(APP)

capture.o: capture.cpp
        @echo "Compiling: $<"
        $(CPP) $(CPPFLAGS) -c $<

$(APP): capture.o
        @echo "Linking: $@"
        $(CPP) -o $@ $^ $(CPPFLAGS) $(LDFLAGS) -lopencv_imgproc -lopencv_core -lopencv_imgcodecs

clean:
        $(AT) rm -f *.o $(APP)

Output:

nvidia@jetson-randy2:~/tegra_multimedia_api/samples/v4l2cuda$ ./capture-cuda
DATA ERROR for buffer: 0
new pointer for buffer 0 = 0x7f6ec5c000
DATA ERROR for buffer: 1
new pointer for buffer 1 = 0x7f6dc88000
new pointer for buffer 2 = 0x7f6ccb4000
new pointer for buffer 3 = 0x7f6702d000
new pointer for buffer 0 = 0x7f66059000
new pointer for buffer 1 = 0x7f65085000
new pointer for buffer 2 = 0x7f640b1000
new pointer for buffer 3 = 0x7f630dd000
new pointer for buffer 0 = 0x7f62109000
new pointer for buffer 1 = 0x7f61135000
saving image: 0 = 0x101136000
saving image: 1 = 0x102108000
saving image: 2 = 0x1030da000
saving image: 3 = 0x1040ac000
saving image: 4 = 0x7f6ec5c000
saving image: 5 = 0x7f6dc88000
saving image: 6 = 0x7f6ccb4000
saving image: 7 = 0x7f6702d000
saving image: 8 = 0x7f66059000
saving image: 9 = 0x7f65085000

Only the first 2 images are marked with a data error, the rest are not, but you can see from the output images that were written to disk that all of the images have incomplete data.

You can download them here:
https://drive.google.com/open?id=1mwSvv4q754xyUCoxLtmtVx9P6V3BWzcP

Cheers!

Hi RS64,
Does the de-bayering issue happen to default v4l2 sample + ov5693? Or it happens in dynamic USERPTRs? Dynamic USERPTRs is not a default mode verified and we may check to support it or not.

Hi Dane,

Technically yes, it still happens. If we use the default v4l2cuda sample + ov5693, the issue does not seem like it occurs (no flickering, and we don’t save the frame data before re-queuing the buffer to verify the data). I believe there is still an underlying data race-condition which I can demonstrate without modifying the USERPTRs between dequeue and requeue.

Let us create an output vector of buffers to copy the raw frame data into, after each V4L2 buffer is dequeued, the same way we allocate the V4L2 input data buffers with cudaMallocManaged():

std::vector<void*> outputBufs;
...

static void
init_userp                      (unsigned int           buffer_size) {
...
	for( int j = 0; j < FRAMES /*10*/; ++j ) {
		void* buf = nullptr;
		if( cudaMallocManaged( &buf, buffer_size, cudaMemAttachGlobal ) != cudaSuccess) {
			printf("cudaMallocManaged failed\n");
		} else {
			outputBufs.push_back(buf);
			printf("allocated output buffer: %d = %p size = %d\n", j, buf, buffer_size);
		}
	}
	if( cudaDeviceSynchronize() != cudaSuccess) {
		printf("cudaDeviceSynchronize failed\n");
	}
...

and simply do a cudaMemcpy in process_image(), but nothing else:

static int outputBufIdx = 0;

static void
process_image                   (void *           p)
{
    printf ("CUDA copy %p -> %p\n", p, outputBufs[outputBufIdx]);
    if( cudaMemcpy( outputBufs[outputBufIdx], p, width * height * 2, cudaMemcpyDefault ) != cudaSuccess) {
    	printf( "memcpy failed\n" );
    }
    ++outputBufIdx;
}

At the end of the application, we will de-bayer and save these images to disk. If we present motion to the camera, i.e. waving your hand infront of the lens, we see torn frames or partial data. Give it a try, here is the full code with changes marked as:

/*** FOR TESTING /

/
FOR TESTING ***/

with linker flags -lopencv_imgproc -lopencv_core -lopencv_imgcodecs

/*
 * Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 */

/*
 *  V4L2 video capture example
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

#include <getopt.h>             /* getopt_long() */

#include <fcntl.h>              /* low-level i/o */
#include <unistd.h>
#include <errno.h>
#include <malloc.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <sys/ioctl.h>

#include <asm/types.h>          /* for videodev2.h */

#include <linux/videodev2.h>

#include <cuda_runtime.h>
/*** FOR TESTING ***/
//#include "yuv2rgb.cuh"
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <vector>
/*** FOR TESTING ***/

#define CLEAR(x) memset (&(x), 0, sizeof (x))
#define ARRAY_SIZE(a)   (sizeof(a)/sizeof((a)[0]))

typedef enum {
    IO_METHOD_READ,
    IO_METHOD_MMAP,
    IO_METHOD_USERPTR,
} io_method;

struct buffer {
    void *                  start;
    size_t                  length;
};

/*** FOR TESTING ***/
static const unsigned int FRAMES = 10;

static const char *     dev_name = "/dev/video0";
static io_method        io = IO_METHOD_USERPTR;
static int              fd = -1;
struct buffer *         buffers = NULL;
static unsigned int     n_buffers = 0;
static unsigned int     width = 2688;
static unsigned int     height = 1944;
static unsigned int     count = FRAMES;
static unsigned char *  cuda_out_buffer = NULL;
static bool             cuda_zero_copy = true;
static const char *     file_name = "out.ppm";
static unsigned int     pixel_format = V4L2_PIX_FMT_SBGGR10;
static unsigned int     field = V4L2_FIELD_NONE;

std::vector<void*> outputBufs;
static int outputBufIdx = 0;

void save_image( int i, void* ptr )
{
	printf( "saving image: %d = %p\n", i, ptr );
	cv::Mat inputMat_16UC1 = cv::Mat( height, width, CV_16UC1, ptr );
	cv::Mat outputMat_16UC3 = cv::Mat( height, width, CV_16UC3 );
	// convert RG10 to RGB 16-bit
	cvtColor( inputMat_16UC1, outputMat_16UC3, cv::COLOR_BayerRG2RGB );

	// scale it to 8-bit
	cv::Mat outputMat_8UC3 = cv::Mat( height, width, CV_8UC3 );
	outputMat_16UC3.convertTo( outputMat_8UC3, CV_8UC3, 0.015625 );

	std::string path = std::string( "/home/nvidia/image_" ) + std::to_string( i ) + std::string( ".bmp" );
	cv::imwrite( path, outputMat_8UC3 );
}
/*** FOR TESTING ***/

static void
errno_exit                      (const char *           s)
{
    fprintf (stderr, "%s error %d, %s\n",
            s, errno, strerror (errno));

    exit (EXIT_FAILURE);
}

static int
xioctl                          (int                    fd,
                                 int                    request,
                                 void *                 arg)
{
    int r;

    do r = ioctl (fd, request, arg);
    while (-1 == r && EINTR == errno);

    return r;
}

static void
process_image                   (void *           p)
{
    printf ("CUDA copy %p -> %p\n", p, outputBufs[outputBufIdx]);
    /*** FOR TESTING ***/
    //gpuConvertYUYVtoRGB ((unsigned char *) p, cuda_out_buffer, width, height);
    if( cudaMemcpy( outputBufs[outputBufIdx], p, width * height * 2, cudaMemcpyDefault ) != cudaSuccess) {
    	printf( "memcpy failed\n" );
    }
    ++outputBufIdx;
#if 0
    /* Save image. */
    if (count == 0) {
        FILE *fp = fopen (file_name, "wb");
        fprintf (fp, "P6\n%u %u\n255\n", width, height);
        fwrite (cuda_out_buffer, 1, width * height * 3, fp);
        fclose (fp);
    }
#endif
    /*** FOR TESTING ***/
}

static int
read_frame                      (void)
{
    struct v4l2_buffer buf;
    unsigned int i;

    switch (io) {
        case IO_METHOD_READ:
            if (-1 == read (fd, buffers[0].start, buffers[0].length)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("read");
                }
            }

            process_image (buffers[0].start);

            break;

        case IO_METHOD_MMAP:
            CLEAR (buf);

            buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
            buf.memory = V4L2_MEMORY_MMAP;

            if (-1 == xioctl (fd, VIDIOC_DQBUF, &buf)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("VIDIOC_DQBUF");
                }
            }

            assert (buf.index < n_buffers);

            process_image (buffers[buf.index].start);

            if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

            break;

        case IO_METHOD_USERPTR:
            CLEAR (buf);

            buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
            buf.memory = V4L2_MEMORY_USERPTR;

            if (-1 == xioctl (fd, VIDIOC_DQBUF, &buf)) {
                switch (errno) {
                    case EAGAIN:
                        return 0;

                    case EIO:
                        /* Could ignore EIO, see spec. */

                        /* fall through */

                    default:
                        errno_exit ("VIDIOC_DQBUF");
                }
            }

            for (i = 0; i < n_buffers; ++i)
                if (buf.m.userptr == (unsigned long) buffers[i].start
                        && buf.length == buffers[i].length)
                    break;

            assert (i < n_buffers);

            /*** FOR TESTING ***/
			if( buf.flags & V4L2_BUF_FLAG_ERROR ) {
				printf( "DATA ERROR for buffer: %d\n", buf.index );
			}
            /*** FOR TESTING ***/
			process_image ((void *) buf.m.userptr);

            if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                errno_exit ("VIDIOC_QBUF");

            break;
    }

    return 1;
}

static void
mainloop                        (void)
{
    while (count-- > 0) {
        for (;;) {
            fd_set fds;
            struct timeval tv;
            int r;

            FD_ZERO (&fds);
            FD_SET (fd, &fds);

            /* Timeout. */
            tv.tv_sec = 2;
            tv.tv_usec = 0;

            r = select (fd + 1, &fds, NULL, NULL, &tv);

            if (-1 == r) {
                if (EINTR == errno)
                    continue;

                errno_exit ("select");
            }

            if (0 == r) {
                fprintf (stderr, "select timeout\n");
                exit (EXIT_FAILURE);
            }

            if (read_frame ())
                break;

            /* EAGAIN - continue select loop. */
        }
    }
}

static void
stop_capturing                  (void)
{
    enum v4l2_buf_type type;

    switch (io) {
        case IO_METHOD_READ:
            /* Nothing to do. */
            break;

        case IO_METHOD_MMAP:
        case IO_METHOD_USERPTR:
            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMOFF, &type))
                errno_exit ("VIDIOC_STREAMOFF");

            break;
    }
}

static void
start_capturing                 (void)
{
    unsigned int i;
    enum v4l2_buf_type type;

    switch (io) {
        case IO_METHOD_READ:
            /* Nothing to do. */
            break;

        case IO_METHOD_MMAP:
            for (i = 0; i < n_buffers; ++i) {
                struct v4l2_buffer buf;

                CLEAR (buf);

                buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
                buf.memory      = V4L2_MEMORY_MMAP;
                buf.index       = i;

                if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                    errno_exit ("VIDIOC_QBUF");
            }

            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMON, &type))
                errno_exit ("VIDIOC_STREAMON");

            break;

        case IO_METHOD_USERPTR:
            for (i = 0; i < n_buffers; ++i) {
                struct v4l2_buffer buf;

                CLEAR (buf);

                buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
                buf.memory      = V4L2_MEMORY_USERPTR;
                buf.index       = i;
                buf.m.userptr   = (unsigned long) buffers[i].start;
                buf.length      = buffers[i].length;

                if (-1 == xioctl (fd, VIDIOC_QBUF, &buf))
                    errno_exit ("VIDIOC_QBUF");
            }

            type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

            if (-1 == xioctl (fd, VIDIOC_STREAMON, &type))
                errno_exit ("VIDIOC_STREAMON");

            break;
    }
}

static void
uninit_device                   (void)
{
    unsigned int i;

    switch (io) {
        case IO_METHOD_READ:
            free (buffers[0].start);
            break;

        case IO_METHOD_MMAP:
            for (i = 0; i < n_buffers; ++i)
                if (-1 == munmap (buffers[i].start, buffers[i].length))
                    errno_exit ("munmap");
            break;

        case IO_METHOD_USERPTR:
            for (i = 0; i < n_buffers; ++i) {
                if (cuda_zero_copy) {
                    cudaFree (buffers[i].start);
                } else {
                    free (buffers[i].start);
                }
            }
            break;
    }

    free (buffers);

    if (cuda_zero_copy) {
        cudaFree (cuda_out_buffer);
    }
}

static void
init_read                       (unsigned int           buffer_size)
{
    buffers = (struct buffer *) calloc (1, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    buffers[0].length = buffer_size;
    buffers[0].start = malloc (buffer_size);

    if (!buffers[0].start) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }
}

static void
init_mmap                       (void)
{
    struct v4l2_requestbuffers req;

    CLEAR (req);

    req.count               = 4;
    req.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory              = V4L2_MEMORY_MMAP;

    if (-1 == xioctl (fd, VIDIOC_REQBUFS, &req)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s does not support "
                    "memory mapping\n", dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_REQBUFS");
        }
    }

    if (req.count < 2) {
        fprintf (stderr, "Insufficient buffer memory on %s\n",
                dev_name);
        exit (EXIT_FAILURE);
    }

    buffers = (struct buffer *) calloc (req.count, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    for (n_buffers = 0; n_buffers < req.count; ++n_buffers) {
        struct v4l2_buffer buf;

        CLEAR (buf);

        buf.type        = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        buf.memory      = V4L2_MEMORY_MMAP;
        buf.index       = n_buffers;

        if (-1 == xioctl (fd, VIDIOC_QUERYBUF, &buf))
            errno_exit ("VIDIOC_QUERYBUF");

        buffers[n_buffers].length = buf.length;
        buffers[n_buffers].start =
            mmap (NULL /* start anywhere */,
                    buf.length,
                    PROT_READ | PROT_WRITE /* required */,
                    MAP_SHARED /* recommended */,
                    fd, buf.m.offset);

        if (MAP_FAILED == buffers[n_buffers].start)
            errno_exit ("mmap");
    }
}

static void
init_userp                      (unsigned int           buffer_size)
{
    struct v4l2_requestbuffers req;
    unsigned int page_size;

    page_size = getpagesize ();
    buffer_size = (buffer_size + page_size - 1) & ~(page_size - 1);

    CLEAR (req);

    req.count               = 4;
    req.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    req.memory              = V4L2_MEMORY_USERPTR;

    if (-1 == xioctl (fd, VIDIOC_REQBUFS, &req)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s does not support "
                    "user pointer i/o\n", dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_REQBUFS");
        }
    }

    buffers = (struct buffer *) calloc (4, sizeof (*buffers));

    if (!buffers) {
        fprintf (stderr, "Out of memory\n");
        exit (EXIT_FAILURE);
    }

    for (n_buffers = 0; n_buffers < 4; ++n_buffers) {
        buffers[n_buffers].length = buffer_size;
        if (cuda_zero_copy) {
            cudaMallocManaged (&buffers[n_buffers].start, buffer_size, cudaMemAttachGlobal);
        } else {
            buffers[n_buffers].start = memalign (/* boundary */ page_size,
                    buffer_size);
        }

        if (!buffers[n_buffers].start) {
            fprintf (stderr, "Out of memory\n");
            exit (EXIT_FAILURE);
        }
    }

    /*** FOR TESTING ***/
	for( int j = 0; j < FRAMES; ++j ) {
		void* buf = nullptr;
		if( cudaMallocManaged( &buf, buffer_size, cudaMemAttachGlobal ) != cudaSuccess) {
			printf("cudaMallocManaged failed\n");
		} else {
			outputBufs.push_back(buf);
			printf("allocated output buffer: %d = %p size = %d\n", j, buf, buffer_size);
		}
	}
	if( cudaDeviceSynchronize() != cudaSuccess) {
		printf("cudaDeviceSynchronize failed\n");
	}
	/*** FOR TESTING ***/
}

static void
init_device                     (void)
{
    struct v4l2_capability cap;
    struct v4l2_cropcap cropcap;
    struct v4l2_crop crop;
    struct v4l2_format fmt;
    unsigned int min;

    if (-1 == xioctl (fd, VIDIOC_QUERYCAP, &cap)) {
        if (EINVAL == errno) {
            fprintf (stderr, "%s is no V4L2 device\n",
                    dev_name);
            exit (EXIT_FAILURE);
        } else {
            errno_exit ("VIDIOC_QUERYCAP");
        }
    }

    if (!(cap.capabilities & V4L2_CAP_VIDEO_CAPTURE)) {
        fprintf (stderr, "%s is no video capture device\n",
                dev_name);
        exit (EXIT_FAILURE);
    }

    switch (io) {
        case IO_METHOD_READ:
            if (!(cap.capabilities & V4L2_CAP_READWRITE)) {
                fprintf (stderr, "%s does not support read i/o\n",
                        dev_name);
                exit (EXIT_FAILURE);
            }

            break;

        case IO_METHOD_MMAP:
        case IO_METHOD_USERPTR:
            if (!(cap.capabilities & V4L2_CAP_STREAMING)) {
                fprintf (stderr, "%s does not support streaming i/o\n",
                        dev_name);
                exit (EXIT_FAILURE);
            }

            break;
    }

/* Select video input, video standard and tune here. */

CLEAR (cropcap);

    cropcap.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

    if (0 == xioctl (fd, VIDIOC_CROPCAP, &cropcap)) {
        crop.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
        crop.c = cropcap.defrect; /* reset to default */

        if (-1 == xioctl (fd, VIDIOC_S_CROP, &crop)) {
            switch (errno) {
                case EINVAL:
                    /* Cropping not supported. */
                    break;
                default:
                    /* Errors ignored. */
                    break;
            }
        }
    } else {
        /* Errors ignored. */
    }

CLEAR (fmt);

    fmt.type                = V4L2_BUF_TYPE_VIDEO_CAPTURE;
    fmt.fmt.pix.width       = width;
    fmt.fmt.pix.height      = height;
    fmt.fmt.pix.pixelformat = pixel_format;
    fmt.fmt.pix.field       = field;

    if (-1 == xioctl (fd, VIDIOC_S_FMT, &fmt))
        errno_exit ("VIDIOC_S_FMT");

    /* Note VIDIOC_S_FMT may change width and height. */

    /* Buggy driver paranoia. */
    min = fmt.fmt.pix.width * 2;
    if (fmt.fmt.pix.bytesperline < min)
        fmt.fmt.pix.bytesperline = min;
    min = fmt.fmt.pix.bytesperline * fmt.fmt.pix.height;
    if (fmt.fmt.pix.sizeimage < min)
        fmt.fmt.pix.sizeimage = min;

    switch (io) {
        case IO_METHOD_READ:
            init_read (fmt.fmt.pix.sizeimage);
            break;

        case IO_METHOD_MMAP:
            init_mmap ();
            break;

        case IO_METHOD_USERPTR:
            init_userp (fmt.fmt.pix.sizeimage);
            break;
    }
}

static void
close_device                    (void)
{
    if (-1 == close (fd))
        errno_exit ("close");

    fd = -1;
}

static void
open_device                     (void)
{
    struct stat st;

    if (-1 == stat (dev_name, &st)) {
        fprintf (stderr, "Cannot identify '%s': %d, %s\n",
                dev_name, errno, strerror (errno));
        exit (EXIT_FAILURE);
    }

    if (!S_ISCHR (st.st_mode)) {
        fprintf (stderr, "%s is no device\n", dev_name);
        exit (EXIT_FAILURE);
    }

    fd = open (dev_name, O_RDWR /* required */ | O_NONBLOCK, 0);

    if (-1 == fd) {
        fprintf (stderr, "Cannot open '%s': %d, %s\n",
                dev_name, errno, strerror (errno));
        exit (EXIT_FAILURE);
    }
}

static void
init_cuda                       (void)
{
    /* Check unified memory support. */
    if (cuda_zero_copy) {
        cudaDeviceProp devProp;
        cudaGetDeviceProperties (&devProp, 0);
        if (!devProp.managedMemory) {
            printf ("CUDA device does not support managed memory.\n");
            cuda_zero_copy = false;
        }
    }

    /* Allocate output buffer. */
    size_t size = width * height * 3;
    if (cuda_zero_copy) {
        cudaMallocManaged (&cuda_out_buffer, size, cudaMemAttachGlobal);
    } else {
        cuda_out_buffer = (unsigned char *) malloc (size);
    }

    cudaDeviceSynchronize ();
}

static void
usage                           (FILE *                 fp,
                                 int                    argc,
                                 char **                argv)
{
    fprintf (fp,
            "Usage: %s [options]\n\n"
            "Options:\n"
            "-c | --count N       Frame count (default: %u)\n"
            "-d | --device name   Video device name (default: %s)\n"
            "-f | --format        Capture input pixel format (default: UYVY)\n"
            "-h | --help          Print this message\n"
            "-m | --mmap          Use memory mapped buffers\n"
            "-o | --output        Output file name (default: %s)\n"
            "-s | --size WxH      Frame size (default: %ux%u)\n"
            "-u | --userp         Use application allocated buffers\n"
            "-z | --zcopy         Use zero copy CUDA memory\n"
            "Experimental options:\n"
            "-r | --read          Use read() calls\n"
            "-F | --field         Capture field (default: INTERLACED)\n"
            "",
            argv[0], count, dev_name, file_name, width, height);
}

static const char short_options [] = "c:d:f:F:hmo:rs:uz";

static const struct option
long_options [] = {
    { "count",      required_argument,      NULL,           'c' },
    { "device",     required_argument,      NULL,           'd' },
    { "format",     required_argument,      NULL,           'f' },
    { "field",      required_argument,      NULL,           'F' },
    { "help",       no_argument,            NULL,           'h' },
    { "mmap",       no_argument,            NULL,           'm' },
    { "output",     required_argument,      NULL,           'o' },
    { "read",       no_argument,            NULL,           'r' },
    { "size",       required_argument,      NULL,           's' },
    { "userp",      no_argument,            NULL,           'u' },
    { "zcopy",      no_argument,            NULL,           'z' },
    { 0, 0, 0, 0 }
};

static struct {
    const char *name;
    unsigned int fourcc;
} pixel_formats[] = {
    { "RGB332", V4L2_PIX_FMT_RGB332 },
    { "RGB555", V4L2_PIX_FMT_RGB555 },
    { "RGB565", V4L2_PIX_FMT_RGB565 },
    { "RGB555X", V4L2_PIX_FMT_RGB555X },
    { "RGB565X", V4L2_PIX_FMT_RGB565X },
    { "BGR24", V4L2_PIX_FMT_BGR24 },
    { "RGB24", V4L2_PIX_FMT_RGB24 },
    { "BGR32", V4L2_PIX_FMT_BGR32 },
    { "RGB32", V4L2_PIX_FMT_RGB32 },
    { "Y8", V4L2_PIX_FMT_GREY },
    { "Y10", V4L2_PIX_FMT_Y10 },
    { "Y12", V4L2_PIX_FMT_Y12 },
    { "Y16", V4L2_PIX_FMT_Y16 },
    { "UYVY", V4L2_PIX_FMT_UYVY },
    { "VYUY", V4L2_PIX_FMT_VYUY },
    { "YUYV", V4L2_PIX_FMT_YUYV },
    { "YVYU", V4L2_PIX_FMT_YVYU },
    { "NV12", V4L2_PIX_FMT_NV12 },
    { "NV21", V4L2_PIX_FMT_NV21 },
    { "NV16", V4L2_PIX_FMT_NV16 },
    { "NV61", V4L2_PIX_FMT_NV61 },
    { "NV24", V4L2_PIX_FMT_NV24 },
    { "NV42", V4L2_PIX_FMT_NV42 },
    { "SBGGR8", V4L2_PIX_FMT_SBGGR8 },
    { "SGBRG8", V4L2_PIX_FMT_SGBRG8 },
    { "SGRBG8", V4L2_PIX_FMT_SGRBG8 },
    { "SRGGB8", V4L2_PIX_FMT_SRGGB8 },
    { "SBGGR10_DPCM8", V4L2_PIX_FMT_SBGGR10DPCM8 },
    { "SGBRG10_DPCM8", V4L2_PIX_FMT_SGBRG10DPCM8 },
    { "SGRBG10_DPCM8", V4L2_PIX_FMT_SGRBG10DPCM8 },
    { "SRGGB10_DPCM8", V4L2_PIX_FMT_SRGGB10DPCM8 },
    { "SBGGR10", V4L2_PIX_FMT_SBGGR10 },
    { "SGBRG10", V4L2_PIX_FMT_SGBRG10 },
    { "SGRBG10", V4L2_PIX_FMT_SGRBG10 },
    { "SRGGB10", V4L2_PIX_FMT_SRGGB10 },
    { "SBGGR12", V4L2_PIX_FMT_SBGGR12 },
    { "SGBRG12", V4L2_PIX_FMT_SGBRG12 },
    { "SGRBG12", V4L2_PIX_FMT_SGRBG12 },
    { "SRGGB12", V4L2_PIX_FMT_SRGGB12 },
    { "DV", V4L2_PIX_FMT_DV },
    { "MJPEG", V4L2_PIX_FMT_MJPEG },
    { "MPEG", V4L2_PIX_FMT_MPEG },
};

static unsigned int v4l2_format_code(const char *name)
{
    unsigned int i;

    for (i = 0; i < ARRAY_SIZE(pixel_formats); ++i) {
        if (strcasecmp(pixel_formats[i].name, name) == 0)
            return pixel_formats[i].fourcc;
    }

    return 0;
}

static struct {
    const char *name;
    unsigned int field;
} fields[] = {
    { "ANY", V4L2_FIELD_ANY },
    { "NONE", V4L2_FIELD_NONE },
    { "TOP", V4L2_FIELD_TOP },
    { "BOTTOM", V4L2_FIELD_BOTTOM },
    { "INTERLACED", V4L2_FIELD_INTERLACED },
    { "SEQ_TB", V4L2_FIELD_SEQ_TB },
    { "SEQ_BT", V4L2_FIELD_SEQ_BT },
    { "ALTERNATE", V4L2_FIELD_ALTERNATE },
    { "INTERLACED_TB", V4L2_FIELD_INTERLACED_TB },
    { "INTERLACED_BT", V4L2_FIELD_INTERLACED_BT },
};

static unsigned int v4l2_field_code(const char *name)
{
    unsigned int i;

    for (i = 0; i < ARRAY_SIZE(fields); ++i) {
        if (strcasecmp(fields[i].name, name) == 0)
            return fields[i].field;
    }

    return -1;
}

int
main                            (int                    argc,
                                 char **                argv)
{
    for (;;) {
        int index;
        int c;

        c = getopt_long (argc, argv,
                short_options, long_options,
                &index);

        if (-1 == c)
            break;

        switch (c) {
            case 0: /* getopt_long() flag */
                break;

            case 'c':
                count = atoi (optarg);
                break;

            case 'd':
                dev_name = optarg;
                break;

            case 'f':
                pixel_format = v4l2_format_code(optarg);
                if (pixel_format == 0) {
                    printf("Unsupported video format '%s'\n", optarg);
                    pixel_format = V4L2_PIX_FMT_UYVY;
                }
                break;

            case 'F':
                field = v4l2_field_code(optarg);
                if ((int)field < 0) {
                    printf("Unsupported field '%s'\n", optarg);
                    field = V4L2_FIELD_INTERLACED;
                }
                break;

            case 'h':
                usage (stdout, argc, argv);
                exit (EXIT_SUCCESS);

            case 'm':
                io = IO_METHOD_MMAP;
                break;

            case 'o':
                file_name = optarg;
                break;

            case 'r':
                io = IO_METHOD_READ;
                break;

            case 's':
                width = atoi (strtok (optarg, "x"));
                height = atoi (strtok (NULL, "x"));
                break;

            case 'u':
                io = IO_METHOD_USERPTR;
                break;

            case 'z':
                cuda_zero_copy = true;
                break;

            default:
                usage (stderr, argc, argv);
                exit (EXIT_FAILURE);
        }
    }

    open_device ();

    init_device ();

    init_cuda ();

    start_capturing ();

    mainloop ();

    stop_capturing ();

    uninit_device ();

    close_device ();

    /*** FOR TESTING ***/
	for( int t = 0; t < outputBufs.size(); ++t) {
		save_image( t, outputBufs[t] );
		printf("free output buffer: %p\n", outputBufs[t]);
		cudaFree(outputBufs[t]);
	}
	cudaDeviceSynchronize();
	/*** FOR TESTING ***/

    exit (EXIT_SUCCESS);

    return 0;
}

Testing this with another camera, we see the same behaviour using an IMX378 (3840x2160) @ 10-bit. Every image has a similar artifact. Image link in next post.

Note: if we add a usleep(33000); as the first line in process_image(), we do not see this problem and the frame looks correct. Hence, I suspect the data is still in transit after the buffer is dequeued.

Thanks!

Here is the torn image: https://drive.google.com/open?id=1gYtF7_qu0vbAQlrq_u2GXfrOc-YWhtwZ

Hi,
It looks to be a race cojdition in vi driver. Please apply the patches of camera and try again.
[url]https://elinux.org/Jetson_TX2/28.2.1_patches[/url]

Hi Dane,

I had a look at the patches. It looks like the patch in the wiki from

https://devtalk.nvidia.com/default/topic/1038067/jetson-tx2/data-error-of-csi-camera-capture-on-jetson-tx2/post/5274769/#5274769

specifically, 0001-drivers-camera-fix-FE-syncpt-wait.patch

solves the issue. I tested this both switching the userptr and leaving it be with an immediate memcpy in process_image(). In both cases the frames appear complete. I will test this further, but at the moment it seems to work.

I appreciate your time looking into this,
Cheers!