VisionWorks + 2 CSI Cameras + White Balance Control

I’m trying to put together an application that uses two CSI cameras for stereo machine vision. I’d like to synchronize the white balance settings of the two cameras, and I’ve been able to achieve this using a single capture session with libArgus. The next step I’d like to take is to route the resulting captured frames into a VisionWorks graph, but it’s not at all clear how I can accomplish this. I’ve scoured the VisionWorks and libArgus documentation, but I’m not seeing a solution.

Can anyone point me in the right direction?

Hi,

We don’t enable the pipeline for Argus+VisionWorks.

It’s recommended to read camera stream directly via VisionWorks.
VisionWorks use GStreamer backend and should be able to adjust white balance for your use case.
https://devtalk.nvidia.com/default/topic/1008913/jetson-tx1/tx1-gstreamer-and-nvcamerasrc-manual-white-balance/post/5148589/#5148589

Thanks.

I appreciate the link, but I fear that solving my problem the way you suggest presupposes that I will know exactly what the white-balance values will need to be in advance. It’s my desire to be able to allow the system to adapt, but keep the two cameras’ settings in sync. With Argus, I can (and do) maintain that kind of fine control over the camera settings, with a minimal lag in the frame pipeline.

My understanding is that Argus is the recommended library to use if you desire to maintain tight control over the camera – am I mistaken? It seems odd to me that these two libraries would be designed in such a way as to be incompatible.

Hi,

Yes, Argus is the recommended library to control the onboard camera.

But Argus with VisionWorks is not enabled.
Currently, VisionWorks only supports GStreamer as camera input.

A possible alternative is to use GStreamer to apply white balance or you may need to handle a memcopy with CUDA.
Thanks.

I’ve taken the “memcopy with cuda” approach, using a combination of GStreamerEGLStreamSinkFrameSourceImpl and NvMediaCSI10640CameraFrameSourceImpl as inspiration. It’s functional and performance is decent.

I am, however, having a stability issue. After running for a minute or two, I’ll receive a crash that looks like very deep recursion, like so:

#0  0x0000007fb6378e94 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#1  0x0000007fb64130b4 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#2  0x0000007fb646c068 in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#3  0x0000007fb6404bac in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#4  0x0000007fb6404fbc in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#5  0x0000007fb632882c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#6  0x0000007fb633115c in ?? () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#7  0x0000007fb6460764 in cuMemcpy3D_v2 () from /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1
#8  0x00000000004dc178 in cudart::driverHelper::driverMemcpy3D(CUDA_MEMCPY3D_st*, CUstream_st*, bool, bool) ()
#9  0x00000000004da5b4 in cudart::arrayHelper::copyToDevice2D(CUmemorytype_enum, cudaArray const*, unsigned long, unsigned long, char*, unsigned long, unsigned long, unsigned long, unsigned long, CUstream_st*, bool, bool) ()
#10 0x00000000004dc02c in cudart::driverHelper::memcpy2DFromArray(char*, unsigned long, cudaArray const*, unsigned long, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, CUstream_st*, bool, bool) ()
#11 0x00000000004dc02c in cudart::driverHelper::memcpy2DFromArray(char*, unsigned long, cudaArray const*, unsigned long, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, CUstream_st*, bool, bool) ()
#12 0x00000000004dc02c in cudart::driverHelper::memcpy2DFromArray(char*, unsigned long, cudaArray const*, unsigned long, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, CUstream_st*, bool, bool) ()
#13 0x00000000004dc02c in cudart::driverHelper::memcpy2DFromArray(char*, unsigned long, cudaArray const*, unsigned long, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, CUstream_st*, bool, bool) ()
#14 0x00000000004dc02c in cudart::driverHelper::memcpy2DFromArray(char*, unsigned long, cudaArray const*, unsigned long, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, CUstream_st*, bool, bool) ()
#15 0x00000000004dc02c in cudart::driverHelper::memcpy2DFromArray(char*, unsigned long, cudaArray const*, unsigned long, unsigned long, unsigned long, unsigned long, cudaMemcpyKind, CUstream_st*, bool, bool) ()

with thousands, and thousands of stacked up calls to cudart::driverHelper::memcpy2DFromArray. It’s a little odd, and it’s obviously crashing well inside the cuda library – do you have any suggestions as to how I might debug this? Are there typical things I can look for, flags I can flip, or logs I can look at that might give me a clue as to what I’m doing that’s causing this?

Hi,

It is recommended to check your application with cuda-memcheck.

cuda-memcheck ./[app]

Thanks.

I did as you suggest:

========= Error: process didn't terminate successfully
=========        The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= No CUDA-MEMCHECK results found

If I follow that program’s suggestion, and run it under cuda-gdb, it’ll run for hours and hours with no fault – until I stop it. Weird, huh? What could be going on here?

Hi,

A common cause is the concurrent read/write.
Could you check if there is any possibility that CPU/GPU read/write the same buffer concurrently?

Thanks.

My test application is currently single-threaded. I grab a frame from an eglstream, copy the two planes into a vx_image and then use the vx_image to generate an rgb frame that I write out to v4l2 loopback for inspection. It’ll run for several minutes before it dies (with a stack trace similar to that above.) I switched out the asynchronous copy calls I pulled from the nvxio source for their synchronous versions, but the problem persists. Here’s the source for the frame grabber, which is designed to pull from two cameras and create a single image from them with each grab:

#include "EGLStreamFrameSource.hpp"

#include <cuda.h>
#include <cuda_runtime_api.h>

#include <NVX/nvxcu.h>

#include <cstring>
#include <unistd.h>

using namespace ovxio;


#define PRINT(format, ...) printf(format "\n", ## __VA_ARGS__)


namespace Theia {

extern PFNEGLQUERYSTREAMKHRPROC                       eglQueryStreamKHR;
extern PFNEGLSTREAMATTRIBKHRPROC                      eglStreamAttribKHR;
extern PFNEGLSTREAMCONSUMERGLTEXTUREEXTERNALKHRPROC   eglStreamConsumerGLTextureExternalKHR;

EGLStreamFrameSource::EGLStreamFrameSource(vx_context context, vx_uint32 frameWidth, vx_uint32 frameHeight, EGLDisplay display, std::vector<EGLStreamKHR> streams) :
    FrameSource(FrameSource::CAMERA_SOURCE, "EGLStreamFrameSource"),
    m_context(context),
    m_streams(streams),
    m_cudaConnections(streams.size(), 0),
    m_display(display),
    m_frameWidth(frameWidth)
{
  m_configuration.frameWidth = frameWidth * streams.size();
  m_configuration.frameHeight = frameHeight;
  m_configuration.fps = 30u; // TODO
  m_configuration.format = NVXCU_DF_IMAGE_RGB;
}

bool EGLStreamFrameSource::open()
{
  close();

  if (cudaSuccess != cudaFree(nullptr)) {
    PRINT("Failed to initialize CUDA context");
    return false;
  }

  for (int i = 0, iMax = m_streams.size(); i < iMax; i++) {
    CUresult curesult = cuEGLStreamConsumerConnect(&m_cudaConnections[i], m_streams[i]);
    if (CUDA_SUCCESS != curesult) {
      PRINT("Connect CUDA consumer ERROR %d", curesult);
      return false;
    }
  }

  m_nv12Frame = vxCreateImage(m_context, m_configuration.frameWidth, m_configuration.frameHeight, VX_DF_IMAGE_NV12);
  NVXIO_CHECK_REFERENCE(m_nv12Frame);

  return (m_isOpen = true);
}

FrameSource::FrameStatus EGLStreamFrameSource::fetch(vx_image image, uint32_t timeout /*milliseconds*/)
{
  if (!m_isOpen) {
    return FrameSource::FrameStatus::CLOSED;
  }

  vx_image workingImage = (m_configuration.format == VX_DF_IMAGE_NV12) ? image : m_nv12Frame;

  for (int i = 0, iMax = m_streams.size(); i < iMax; i++) {
    // Check for new frames in EglStream
    EGLint streamState;
    do {
      if (!eglQueryStreamKHR(m_display, m_streams[i], EGL_STREAM_STATE_KHR, &streamState)) {
        PRINT("Cuda consumer, eglQueryStreamKHR EGL_STREAM_STATE_KHR failed");
        close();
        return FrameSource::FrameStatus::CLOSED;
      }
      switch (streamState) {
        case EGL_STREAM_STATE_DISCONNECTED_KHR: {
          PRINT("CUDA Consumer: - EGL_STREAM_STATE_DISCONNECTED_KHR received");
          close();
          return FrameSource::FrameStatus::CLOSED;
        }
        case EGL_STREAM_STATE_EMPTY_KHR:
        case EGL_STREAM_STATE_OLD_FRAME_AVAILABLE_KHR:
        case EGL_STREAM_STATE_CONNECTING_KHR: {
          usleep(1000);
          break;
        }
      }
    } while (streamState != EGL_STREAM_STATE_NEW_FRAME_AVAILABLE_KHR);

    CUgraphicsResource cudaResource = nullptr;
    CUeglFrame eglFrame;
    CUresult cuStatus;

    cuStatus = cuEGLStreamConsumerAcquireFrame(&m_cudaConnections[i], &cudaResource, nullptr, timeout * 1000);
    if (cuStatus != CUDA_SUCCESS) {
      PRINT("Cuda Acquire failed cuStatus=%d", cuStatus);
      close();
      return FrameSource::CLOSED;
    }

    cuStatus = cuGraphicsResourceGetMappedEglFrame(&eglFrame, cudaResource, 0, 0);
    if (cuStatus != CUDA_SUCCESS) {
      PRINT("Cuda get resource failed with %d", cuStatus);
      cuEGLStreamConsumerReleaseFrame(&m_cudaConnections[i], cudaResource, nullptr);
      close();
      return FrameSource::CLOSED;
    }

    NVXIO_ASSERT(eglFrame.frameType == CU_EGL_FRAME_TYPE_ARRAY);
    NVXIO_ASSERT(eglFrame.cuFormat == CU_AD_FORMAT_UNSIGNED_INT8);
    NVXIO_ASSERT(eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR);
    NVXIO_ASSERT(eglFrame.planeCount == 2);

    NVXIO_ASSERT(eglFrame.height == m_configuration.frameHeight);
    NVXIO_ASSERT(eglFrame.width * m_streams.size() == m_configuration.frameWidth);

    vx_rectangle_t rect = {
      eglFrame.width * i,
      0,
      eglFrame.width * (i + 1),
      eglFrame.height
    };

    vx_imagepatch_addressing_t addr;
    void *ptr;
    vx_map_id map_id;

    // copy the first plane y

    NVXIO_SAFE_CALL( vxMapImagePatch(workingImage, &rect, 0, &map_id, &addr, &ptr, VX_WRITE_ONLY, NVX_MEMORY_TYPE_CUDA, VX_NOGAP_X) );

    NVXIO_ASSERT( cudaMemcpy2DFromArray(ptr, addr.stride_y,
                                        (const struct cudaArray *) eglFrame.frame.pArray[0],
                                        0, 0,
                                        eglFrame.width * sizeof(vx_uint8), addr.dim_y,
                                        cudaMemcpyDeviceToDevice) == cudaSuccess );

    NVXIO_SAFE_CALL( vxUnmapImagePatch(workingImage, map_id) );

    // copy the second plane u/v

    NVXIO_SAFE_CALL( vxMapImagePatch(workingImage, &rect, 1, &map_id, &addr, &ptr, VX_WRITE_ONLY, NVX_MEMORY_TYPE_CUDA, VX_NOGAP_X) );

    NVXIO_ASSERT( (cudaMemcpy2DFromArray(ptr, addr.stride_y,
                                         (const struct cudaArray *)eglFrame.frame.pArray[1],
                                         0, 0,
                                         ((addr.dim_x * sizeof(vx_uint16)) >> 1), addr.dim_y >> 1,
                                         cudaMemcpyDeviceToDevice) == cudaSuccess) );

    NVXIO_SAFE_CALL( vxUnmapImagePatch(workingImage, map_id) );

    cuStatus = cuEGLStreamConsumerReleaseFrame(&m_cudaConnections[i], cudaResource, nullptr);
  }

  if (workingImage != image) {
    NVXIO_SAFE_CALL( vxuColorConvert(m_context, workingImage, image) );
  }

  return FrameSource::FrameStatus::OK;
}

FrameSource::Parameters EGLStreamFrameSource::getConfiguration() {
  return m_configuration;
}

bool EGLStreamFrameSource::setConfiguration(const FrameSource::Parameters& params)
{
  // ignore FPS, width, height values
  if (params.frameWidth != (uint32_t)-1) {
    return false;
  }
  if (params.frameHeight != (uint32_t)-1) {
    return false;
  }
  if (params.fps != (uint32_t)-1) {
    return false;
  }

  return true;
}

void EGLStreamFrameSource::close()
{
  if (!m_isOpen) {
    return;
  }

  if (m_nv12Frame) {
    vxReleaseImage(&m_nv12Frame);
    m_nv12Frame = nullptr;
  }

  m_isOpen = false;
}

EGLStreamFrameSource::~EGLStreamFrameSource() {
  close()
}

}

Hi,

Suppose there are CPU/GPU processing cross the pipeline you mentioned in comment #9.

Please remember to add synchronize call when switching the executing processor.
Sometime concurrent access between processes may occur if no synchronize call inserted.

Thanks.

None – that I’m aware of. libArgus creates a ton of threads with no explanation, but I’m seeing this in completely synchronous code on the main thread.

Interestingly, I am unable to reproduce the crash if I only copy the Y plane of each frame. The issue seems to be with copying the combined UV plane. If I omit that, it’ll go all day. Below is the relevant section. Am I doing anything obviously wrong, here?

NVXIO_SAFE_CALL( vxMapImagePatch(workingImage, &rect, 1, &map_id, &addr, &ptr, VX_WRITE_ONLY, NVX_MEMORY_TYPE_CUDA, VX_NOGAP_X) );

NVXIO_ASSERT( (cudaMemcpy2DFromArray(ptr, addr.stride_y,
                                         (const struct cudaArray *)eglFrame.frame.pArray[1],
                                         0, 0,
                                         ((addr.dim_x * sizeof(vx_uint16)) >> 1), addr.dim_y >> 1,
                                         cudaMemcpyDeviceToDevice) == cudaSuccess) );

NVXIO_SAFE_CALL( vxUnmapImagePatch(workingImage, map_id) );

Hi,

Based on the log of comment #5, this is a known in libcuda.so.1 and is already fixed internally.
We are checking with our CUDA team and will share with you if anything we can share currently.

Thanks.