Multimedia API VideoConverter

I am using “/dev/nvhost-vic” to convert UyVy frames to ABGR, like what’s done in the sample 07_video_convert, but with a bit modification–I use 6 threads to input YUV frames.

I removed conv1, only registered one callback in conv0:
conv0_capture_dqbuf_thread_callback(struct v4l2_buffer *v4l2_buf,
NvBuffer * buffer, NvBuffer * shared_buffer,
void *arg)

So the 6 YUV frames (from 6 threads) are queued to the videoConverter, when the callback returns 6 times, how can I tell which ARGB is from which YUV frame? Is there any data field in the V4L2_buf that I can use to map to the input YUV buffer?

Hi HooverLv,
Please assign timestamp to each v4l2buffer. A similar example is at tegra_multimedia_api/samples/backend

Hi, DaneLLL,

thank you, times-stamp is useful; I can map the output to input now.

Then I use the gettimeofday() and timestamps to check the performance. It turns out not good. When converting one channel of 720*576 UyVu frame, it took me 4000~6000 us. If I stream in 7 channels of PAL videos, each channel delay is 40ms~60ms.

below is my init codes:

ctx->in_width = 720;
ctx->in_height = 576;
ctx->in_pixfmt = V4L2_PIX_FMT_UYVY;

ctx->out_buftype =	ctx->in_buftype = BUF_TYPE_RAW;

ctx->out_width = 720;
ctx->out_height = 576;
ctx->out_pixfmt = V4L2_PIX_FMT_ABGR32;

ctx->conv0 = NvVideoConverter::createVideoConverter("conv0");
// Set conv0 output plane format
 ret = ctx->conv0->setOutputPlaneFormat(ctx->in_pixfmt, ctx->in_width,
                ctx->in_height, V4L2_NV_BUFFER_LAYOUT_PITCH);
 assert(ret >= 0);

 ret = ctx->conv0->setCapturePlaneFormat(ctx->out_pixfmt, ctx->out_width,
                 ctx->out_height, V4L2_NV_BUFFER_LAYOUT_PITCH);
 assert(ret >= 0);

 ret = ctx->conv0->output_plane.setupPlane(V4L2_MEMORY_USERPTR, 7, false, true);

 ret = ctx->conv0->capture_plane.setupPlane(V4L2_MEMORY_USERPTR, 7, false, true);

 // conv0 output plane STREAMON
 ret = ctx->conv0->output_plane.setStreamStatus(true);

 // conv0 capture plane STREAMON
 ret = ctx->conv0->capture_plane.setStreamStatus(true);

… …

Hi HooverLv,
The result looks identical in the one channel and 7 channel case. For one channel, it is ~200fps(~5ms per frame). So ideally for 7 channels, you should get ~28fps(~35ms), but you get 40~60ms. It is a little bit slower and probably due to multi-threading. This can be max performance of the HW converter. How is the performance of using OpenCV cv::cvtColor? Is it better than using the HW converter?

Hi, DaneLLL,

with gettimeofday(), cv::cvtColor takes 2~4 ms per frame, even when 7 channels are running simultaneously. It is better than the HW converter’s 5ms delay, maybe because opencv uses the 4 CPU while the HW converter processes frames sequentially? however I do see a 20% CPU drop when the HW converter is running.

Hi HooverLv,
I did quick check to the performance of the HW converter and the result is close to yours, so it looks like the HW converter does not bring you enough performance. Just as you said, the HW converter is only one and has to sequentially process incoming frames.

So for your case, please try if cv::cvtColor performs better in running CPUs at max frequency always:
sudo ./tegrastats -max

I have off-shifted the color space conversion to GPU. Now the CPU part only takes 800us.

Could you please explain a little more?