Using libargus on TX1 without X

We are currently working with nvcamerasrc and gstreamer in our applications. We are investigating switching to libargus, but it looks like there is a hard dependency on EGLStreams which in turn creates a dependency on X11 which chews up over 1GB of scarce system resources. I hope I am missing something - is there any way to use libargus on an embedded TX1/TX2 which does not have a display or X overhead?

All of the examples that I’ve looked at in the Tegra mm api directory have a dependency on X. I’ve found a forum post on this topic which suggests running X remotely, which does not help us since there is no remote system to display on in our use case.

Our flow is camera -> encoder -> file or camera -> encoder -> live stream. We never have a use case with a local display.

Hi sperok,
Please try

(reboot the system)
nvidia@tegra-ubuntu:~/tegra_multimedia_api/samples/10_camera_recording$ sudo service lightdm stop
nvidia@tegra-ubuntu:~/tegra_multimedia_api/samples/10_camera_recording$ ./camera_recording

DaneLLL - That example is V4L2 based. Is there an example which has argus feeding a gstreamer pipeline without using X on 28.1?

Hi sperok,
Please refer to https://devtalk.nvidia.com/default/topic/1028387/jetson-tx1/closed-gst-encoding-pipeline-with-frame-processing-using-cuda-and-libargus/post/5232036/#5232036

DaneLLL - The patch in that post fails to install on 28.2, with chunk 5 rejected. Here is the reject file:

--- samples/10_camera_recording/main.cpp
+++ samples/10_camera_recording/main.cpp
@@ -266,10 +279,20 @@ bool ConsumerThread::threadExecute()
             ORIGINATE_ERROR("IImageNativeBuffer not supported by Image.");
         fd = iNativeBuffer->createNvBuffer(STREAM_SIZE,
                                            NvBufferColorFormat_YUV420,
-                                           NvBufferLayout_BlockLinear);
+                                           NvBufferLayout_Pitch);
         if (VERBOSE_ENABLE)
             CONSUMER_PRINT("Acquired Frame. %d\n", fd);
 
+        EGLImageKHR egl_image = NULL;
+        egl_image = NvEGLImageFromFd(egl_display, fd);
+        if (egl_image == NULL)
+        {
+            fprintf(stderr, "Error while mapping dmabuf fd (0x%X) to EGLImage\n",
+                     fd);
+        }
+        HandleEGLImage(&egl_image);
+        NvDestroyEGLImage(egl_display, egl_image);
+
         // Push the frame into V4L2.
         v4l2_buf.m.planes[0].m.fd = fd;
         v4l2_buf.m.planes[0].bytesused = 1; // byteused must be non-zero

And here is the code currently in main.cpp in the offending area. All of the lines in the if (DO_CPU_PROCESS) block are new in 28.2 compared to 28.1.

// Get the IImageNativeBuffer extension interface and create the fd.
        NV::IImageNativeBuffer *iNativeBuffer =
            interface_cast<NV::IImageNativeBuffer>(iFrame->getImage());
        if (!iNativeBuffer)
            ORIGINATE_ERROR("IImageNativeBuffer not supported by Image.");
        fd = iNativeBuffer->createNvBuffer(STREAM_SIZE,
                                           NvBufferColorFormat_YUV420,
                                           (DO_CPU_PROCESS)?NvBufferLayout_Pitch:NvBufferLayout_BlockLinear);
        if (VERBOSE_ENABLE)
            CONSUMER_PRINT("Acquired Frame. %d\n", fd);

        if (DO_CPU_PROCESS) {
            NvBufferParams par;
            NvBufferGetParams (fd, &par);
            void *ptr_y;
            uint8_t *ptr_cur;
            int i, j, a, b;
            NvBufferMemMap(fd, Y_INDEX, NvBufferMem_Write, &ptr_y);
            NvBufferMemSyncForCpu(fd, Y_INDEX, &ptr_y);
            ptr_cur = (uint8_t *)ptr_y + par.pitch[Y_INDEX]*START_POS + START_POS;

            // overwrite some pixels to put an 'N' on each Y plane
            // scan array_n to decide which pixel should be overwritten
            for (i=0; i < FONT_SIZE; i++) {
                for (j=0; j < FONT_SIZE; j++) {
                    a = i>>SHIFT_BITS;
                    b = j>>SHIFT_BITS;
                    if (array_n[a][b])
                        (*ptr_cur) = 0xff; // white color
                    ptr_cur++;
                }
                ptr_cur = (uint8_t *)ptr_y + par.pitch[Y_INDEX]*(START_POS + i)  + START_POS;
            }
            NvBufferMemSyncForDevice (fd, Y_INDEX, &ptr_y);
            NvBufferMemUnMap(fd, Y_INDEX, &ptr_y);
        }

        // Push the frame into V4L2.
        v4l2_buf.m.planes[0].m.fd = fd;
        v4l2_buf.m.planes[0].bytesused = 1; // byteused must be non-zero
        CHECK_ERROR(m_VideoEncoder->output_plane.qBuffer(v4l2_buf, NULL));
    }

Hi sperok,
Have you met difficulty to merge the conflict? Seems not too hard to do it by yourself…

I just inserted the lines in front of the “DO_CPU_PROCESS” block and it appeared to work. The video file gets recorded correctly but there are a variety of warnings and errors when the program starts and later exits.

./camera_recording -d 10 -s
Set governor to performance before enabling profiler
OFParserGetVirtualDevice: virtual device driver node not found in proc device-tree
OFParserGetVirtualDevice: virtual device driver node not found in proc device-tree
LoadOverridesFile: looking for override file [/Calib/camera_override.isp] 1/16LoadOverridesFile: looking for override file [/data/nvcam/settings/camera_overrides.isp] 2/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/camera_overrides.isp] 3/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/camera_overrides.isp] 4/16LoadOverridesFile: looking for override file [/data/nvcam/camera_overrides.isp] 5/16LoadOverridesFile: looking for override file [/data/nvcam/settings/e3326_front_P5V27C.isp] 6/16LoadOverridesFile: looking for override file [/opt/nvidia/nvcam/settings/e3326_front_P5V27C.isp] 7/16LoadOverridesFile: looking for override file [/var/nvidia/nvcam/settings/e3326_front_P5V27C.isp] 8/16---- imager: No override file found. ----
PRODUCER: Creating output stream
PRODUCER: Launching consumer thread
Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 4 
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
875967048
842091865
create video encoder return true
CONSUMER: Waiting until producer is connected...
PRODUCER: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
SCF: Error InvalidState:  NonFatal ISO BW requested not set. Requested = 2147483647 Set = 4687500 (in src/services/power/PowerServiceCore.cpp, function setCameraBw(), line 653)
===== MSENC blits (mode: 1) into tiled surfaces =====
CONSUMER: Got EOS, exiting...
CONSUMER: Done.
PRODUCER: Done -- exiting.
CONSUMER: Terminating----------- Element = enc0 -----------
Total Profiling time = 9.78134
Average FPS = 30.1595
Total units processed = 296
Average latency(usec) = 2395
Minimum latency(usec) = 777
Maximum latency(usec) = 4644
-------------------------------------
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
SCF: Error NotSupported: Unable to make context current (in src/services/gl/GLService.cpp, function unmakeContextCurrent(), line 453)
SCF: Error NotSupported:  (propagating from src/services/gl/GLContext.cpp, function unmakeCurrent(), line 59)
SCF: Error InvalidState: Error destorying EGL context (in src/services/gl/GLService.cpp, function destroyContext(), line 385)
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function unmakeCurrent() line 249
	Failed to make context current
FNET: Error in /dvs/git/dirty/git-master_linux/camera/utils/nvfnet/src/backends/OpenGL/OpenGLEGL.cpp function cleanup() line 227
	(propagating)
************************************
Total Profiling Time = 0 sec
************************************

The MUCH BIGGER problem with this example is the memcpy in ConsumerThread::encoderCapturePlaneDqCallback(). This will absolutely kill performance. There must be a zero-copy method of getting buffers from argus to gstreamer, no?

Hi sperok,
We don’t think it is an issue to copy h264 stream to GstBuffer. The sample has eliminated copying YUV420s.

Also share a r28.2 patch @ https://devtalk.nvidia.com/default/topic/1028387/jetson-tx1/closed-gst-encoding-pipeline-with-frame-processing-using-cuda-and-libargus/post/5256753/#5256753

Thank you for the patch for 28.2.

Forgive me, but it sure seems like unnecessary buffer copies are a very big issue when working at 4k60 or 720p240. In all profiling the memcpy dominates processing time. Why not queue the captures directly into gst buffers?

Hi sperok,
For now I don’t have enough experience in gstreamer to give out this implementation. If you know how to optimize this, it would be great to share the patch.

Can you provide the source code for nvarguscamerasrc that was added in 28.2? I would hope it solves this problem, but it is severely feature deficient. It would be easier and better to add the missing features to nvarguscamerasrc than to put work into this sample. Critical missing functionality includes lack of parameters for auto-exposure, aeLock and various other elements available in libargus but not exposed by the current plugin.

Hi sperok,
No, nvarguscamerasrc is not open source. We are evaluating it.

BTW, do you see better performance with gstreamer? We implement libgstomx.so based on 3rdparty code:

You can see it also performs copying h264 stream to GstBuffer in gst_omx_video_enc_handle_output_frame()

Our source code is at https://developer.nvidia.com/embedded/dlc/l4t-tx1-sources-28-2-ga

One very basic principle of Gstreamer is to ALWAYS AVOID BUFFER COPIES. They are absolute death for performance especially as we move to 4k60 and higher rates. An 8MP image at 12bits/pixel is 12MB, at 60FPS that’s 720MB/s of data transfer that should be avoided whenever possible. The expectation of an encoder would be to sink the incoming 720MB/sec of raw video and output the compressed h.264 or hevc stream at 25mbps or so without doing any extra buffer copies of the raw data.

This is my first time looking at that encoder code, but if I interpret it correctly it is not doing a memcpy() of the raw data stream (~ 1Gbps). The block of code below is doing a memcpy() to insert the encoded data (~ 25Mbps) into an output frame, which is still an area of potential performance improvement, but a much lower data rate than the raw video that nvarguscamerasrc would be dealing with.

if (buf->omx_buf->nFilledLen > 0) {
      outbuf = gst_buffer_new_and_alloc (buf->omx_buf->nFilledLen);

      gst_buffer_map (outbuf, &map, GST_MAP_WRITE);
      memcpy (map.data,
          buf->omx_buf->pBuffer + buf->omx_buf->nOffset,
          buf->omx_buf->nFilledLen);
      gst_buffer_unmap (outbuf, &map);

Look like you don’t understand the patch correctly. If you read it carefully, you will realize the memcpy() is to insert encoded data (~ 25Mbps), same as gstreamer gstomx