Using VPI in GStreamer

Hi,

Trying to perform camera undistortion using VPI with VIC backend in gstreamer combining two examples, /opt/nvidia/vpi2/samples/11-fisheye and nvsample_cudaprocess. In the nvsample_cudaprocess.cu gpu_process function using vpiSubmitRemap similar to 11-fisheye/main.cpp main function, but first attempting to convert EGLImageKHR to VPIImage using vpiImageCreateEGLImageWrapper. Following snippet part of nvsample_cudaprocess.cu gpu_process used to work with vpi1

vpiImageCreateEGLImageWrapper(image, NULL, VPI_BACKEND_CUDA, &vimg);
vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, vimg, tmpIn, NULL);
vpiSubmitRemap(stream, VPI_BACKEND_CUDA, remap, tmpIn, tmpOut, VPI_INTERP_CATMULL_ROM, VPI_BORDER_ZERO, 0);
vpiSubmitConvertImageFormat(stream, VPI_BACKEND_CUDA, tmpOut, vimg, NULL);
vpiStreamSync(stream);

using
gst-launch-1.0 nvarguscamerasrc ! ‘video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1’ ! nvivafilter cuda-process=true customer-lib-name=“libnvsample_cudaprocess.so” ! ‘video/x-raw(memory:NVMM), format=(string)NV12’ ! nvoverlaysink display-id=0 -e

Problem is that vpi/EGLInterop.h is missing in vpi2 so it doesn’t work, is there a substitute for or sample of equivalent vpiImageCreateEGLImageWrapper in vpi2? There is a similar topic VPI in a GStreamer pipeline but likely using vpi1.
Side issue with Remap is that 1080p NV12 latency using CUDA backend is favorable to VIC but it doesn’t compare percentage of GPU utilized?

Thanks.

Hi,

The API is changed in VPI 2 but it should still work.
Please check the following document for information.

https://docs.nvidia.com/vpi/group__VPI__Image.html#ga3e7cf2520dd568a7e7a9a6876ea7995c

Thanks.

Hi,

Replaced

// vpiImageCreateEGLImageWrapper(image, NULL, VPI_BACKEND_CUDA, &vimg);
vpiImageCreateWrapper((VPIImageData*)image, NULL, VPI_BACKEND_CUDA, &vimg)

where image is EGLImageKHR and vimg is VPIImage, this compiles but doesn’t run, gpu_process function is not called. From traces, there is more basic problem with nvivafilter, running preinstalled /usr/lib/aarch64/linux-gnu/libnvsample_cudaprocess.so, it errors out in

GST_DEBUG=2 gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! ‘video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, framerate=(fraction)30/1, format=(string)NV12’ ! nvivafilter cuda-process=true customer-lib-name=“libnvsample_cudaprocess.so” ! ‘video/x-raw(memory:NVMM), format=(string)NV12’ ! nvvidconv ! xvimagesink -e

Setting pipeline to PAUSED …
Pipeline is live and does not need PREROLL …
Setting pipeline to PLAYING …
New clock: GstSystemClock
GST_ARGUS: Creating output stream
CONSUMER: Waiting until producer is connected…
GST_ARGUS: Available Sensor modes :
GST_ARGUS: 3840 x 2160 FR = 29.999999 fps Duration = 33333334 ; Analog Gain range min 1.000000, max 16.000000; Exposure Range min 62000, max 250000000;

GST_ARGUS: Running with following settings:
Camera index = 0
Camera mode = 0
Output Stream W = 3840 H = 2160
seconds to Run = 0
Frame Rate = 29.999999
GST_ARGUS: Setup Complete, Starting captures for 0 seconds
GST_ARGUS: Starting repeat capture requests.
CONSUMER: Producer has connected; continuing.
0:00:00.370266780 9972 0xaaaae77db400 WARN basesrc gstbasesrc.c:3072:gst_base_src_loop: error: Internal data stream error.
0:00:00.370353565 9972 0xaaaae77db400 WARN basesrc gstbasesrc.c:3072:gst_base_src_loop: error: streaming stopped, reason error (-5)
ERROR: from element /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0: Internal data stream error.
Additional debug info:
gstbasesrc.c(3072): gst_base_src_loop (): /GstPipeline:pipeline0/GstNvArgusCameraSrc:nvarguscamerasrc0:
streaming stopped, reason error (-5)
EOS on shutdown enabled – waiting for EOS after Error
Waiting for EOS…

When compiled from source, same error, it executes nvsample_cudaprocess.cu init function but none of the pre_process, gpu_process, and video render remains blank. The basic pipeline without nvivafilter still works

gst-launch-1.0 nvarguscamerasrc sensor-id=0 ! ‘video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, framerate=(fraction)30/1, format=(string)NV12’ ! nvvidconv ! xvimagesink -e

only difference from Accelerated_GStreamer_User_Guide.pdf page 38 is nvoverlaysink which is not installed anymore.
Have there been any recent changes to nvivafilter API or to nvsample_cudaprocess.cu source?

Thanks.

This is an old post in early VPI versions, but you may have a look to:

Hi,

Incidentally, also referenced that older post insofar that it was using vpiImageCreateEGLImageWrapper since there were no other examples in vpi1/samples, now deprecated in vp2 superseded by vpiImageCreateWrapper. While not currently using parallel pipelines, that part of the post could still be useful later after single pipeline starts working, currently trying to get the filter running under same pipeline as documented in Accelerated_GStreamer_User_Guide.pdf Sample CSI Camera pipeline. Since that 2019 sample doesn’t work as is (nvvidconv ! xvimagesink instead of nvoverlaysink) even with prebuilt nvidia-l4t-gstreamer binary libnvsample_cudaprocess.so, was wondering if there were any other changes to gstreamer/nvivafilter/nvsample_cudaprocess.cu? Only after sample starts working, I can try substitute with privately built one with VPI code.
Not sure why but can only see the 1st line of rotation_filer.cu, not as an attachment.

Thanks.

Hi,

When you create the VPIImageData, the buffer type should be VPI_IMAGE_BUFFER_EGLIMAGE.

https://docs.nvidia.com/vpi/group__VPI__Image.html#ga895b41338ff328601a68383c7ae939d0

For the nvivafilter issue, could you try to clean the cache to see if it works?

$ rm -rf ${HOME}/.cache/gstreamer-1.0

Thanks.

Hi,

The nvsample_cudaprocess.cu that comes with nvsample_cudaprocess_src.tbz2 with Jetson Linux Driver Package (L4T) sources has on line 194

static void gpu_process (EGLImageKHR image, void ** usrptr) {

The vpiImageCreateWrapper VPIImageBufferType documentation just states EGLImage, so it’s not clear is EGLImageKHR image supposed to be cast directly to VPIImageData* data, or first extract underlying buffer from EGLImageKHR and then cast it to VPIImageData* and how? Looking through vpi2/samples and jetson_multimedia_api/samples, couldn’t find good example how to convert EGLImageKHR to VPIImageData.

For the nviviafilter, clearing the plugins cache unfortunately that didn’t fix the error, still comes back same as -5. Can turn on GST_DEBUG=5 and GST_DEBUG_DUMP_NOT_DIR, and since that log is going to be very long, are there any specific warning/errors I should filter on?

Thanks.

Hi,

We are going to share a sample to use VPI2 in GStreamer.
Will update more information with you later.

For the nviviafilter, there will be a GA release soon.
Would you mind testing it again once the JetPack 5 GA comes out?

Thanks.

Hi,

For the nvivafilter, if there is candidate version of libgstnvivafilter.so, libgstnvarguscamerasrc.so before official GA comes out, could test that too. Recompiling gstreamer 1.16.3 did not make a difference since it doesn’t change NV plugins binaries.

Thanks.

Hi,

Official GA just comes out, please give it a try.

Thanks.

Hi,

JetPack 5.0.2 installs successfully but nvarguscamerasrc camera source doesn’t work, not due to gstreamer, libgstnvarguscamerasrc.so, but camera driver. The binaries for custom camera driver for IMX477 for Jetson Linux 34.1.1 won’t work for 35.1 and even recompiling custom kernel from sources, camera is still not detected.

v4l2-ctl --list-devices
Cannot open device /dev/video0

ls /dev/video*
empty

dmesg | grep -E “imx477”
empty

python /opt/nvidia/jetson-io/jetson-io.py
only shows IMX274, IMX185 and IMX390, doesn’t show IMX477. With official 5.0.2/35.1 not sure why it doesn’t show up since there is a native driver, but even if it did, still not sure that would work since there is LI-JXAV-MIPI-ADPT-4CAM adapter in between.

Regarding VPI2, once you have the EGLImageKHR to VPIImageData snippet/sample, could try to find a different camera.

Thanks.

Hi,

Thanks for the testing.
We will help to check the IMX477 issue and get back to you.

@predrag12
Default native Orin device tree don’t support IMX477.
I would suggest consult with Leopard to get the device tree and driver for Orin r35.1

Thanks

Hi,

Reading Configuring the CSI Connector, for other Jetsons IMX477 could be set as default camera with /opt/nvidia/jetson-io/jetson-io.py, and looking at the kernel sources which contains nv_imx477.c and imx477_mode_tbls.h, so assumed that there was native driver support for Orin.
The custom LI device tree and driver for IMX477 works for Jetson Linux 34.1.1 (sans nvivafilter) but does not work for 35.1, so long term solution is to wait for such driver update. Meanwhile as short/mid term recourse, might be using one of the other cameras over same LI adapter for which there is native Orin driver. Would Nvidia IMX390 driver work or is it meant for some other adapter?

Thanks.

Hi,

Sorry for the late reply.

Would you mind filing another topic about the camera support of Orin?
Let us focus on the GStreamer + VPI in this topic.

We solved the nvivafilter issue in r35.1.
Please replace the libgstnvivafilter.so with this attachment (17.4 KB).

Then please update the Makefile for Orin GPU architecture.

diff --git a/Makefile b/Makefile
index e6b05fd..043a943 100644
--- a/Makefile
+++ b/Makefile
@@ -111,15 +111,11 @@ LIBRARIES += -L$(TEGRA_LIB_DIR) -lcuda -lrt
 ifneq ($(OS_ARCH),armv7l)
 GENCODE_SM10    := -gencode arch=compute_10,code=sm_10
 endif
-GENCODE_SM30    := -gencode arch=compute_30,code=sm_30
-GENCODE_SM32    := -gencode arch=compute_32,code=sm_32
-GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
-GENCODE_SM50    := -gencode arch=compute_50,code=sm_50
-GENCODE_SMXX    := -gencode arch=compute_50,code=compute_50
 GENCODE_SM53    := -gencode arch=compute_53,code=sm_53
 GENCODE_SM62    := -gencode arch=compute_62,code=sm_62
 GENCODE_SM72    := -gencode arch=compute_72,code=sm_72
-GENCODE_SM_PTX  := -gencode arch=compute_72,code=compute_72
+GENCODE_SM87    := -gencode arch=compute_87,code=sm_87
+GENCODE_SM_PTX  := -gencode arch=compute_87,code=compute_87
 ifeq ($(OS_ARCH),armv7l)
 GENCODE_FLAGS   ?= $(GENCODE_SM32)
 else

Then you should be able to run it with the following pipeline: (testing with IMX274)

$ gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),format=NV12' ! nvivafilter cuda-process=true pre-process=true post-process=true customer-lib-name="libnvsample_cudaprocess.so" ! 'video/x-raw(memory:NVMM), format=(string)RGBA' ! nv3dsink

We have confirmed the gpu_process will be executed normally.

Following, we are going to make a VPI sample based on nvsample_cudaprocess.
Will let you know once it is completed.

Thanks.

Hi,

Attached is a sample for nvsample_cudaprocess with VPI for your reference.
0001-Add-VPI-support.patch (5.0 KB)

Thanks.

Hi,

We can focus on gstreamer and VPI but there is a dependency in r35.1 on CSI camera (driver) so cannot execute gst-launch-1.0 nvarguscamerasrc exactly the same using USB camera. Alternatively would fixed libgstnvivafilter.so work on 34.1.1 since gstreamer didn’t change, or is IMX274 driver compatible with LI board?

Thank you for the sample. Was not able to run with

gst-launch-1.0 -v v4l2src device=/dev/video0 ! nvvidconv ! ‘video/x-raw(memory:NVMM), width=(int)1280, height=(int)960, framerate=(fraction)30/1, format=(string)NV12’ ! nvivafilter cuda-process=true customer-lib-name=“libnvsample_cudaprocess.so” ! ‘video/x-raw(memory:NVMM), format=(string)RGBA’ ! nv3dsink -e

Invalid eglcolorformat 7
Error: VPI_ERROR_INVALID_IMAGE_FORMAT in nvsample_cudaprocess.cu at line 293 (CUDA: Conversion not implemented between VPIImageFormat(VPI_COLOR_MODEL_RGB,VPI_COLOR_SPEC_UNDEFINED,VPI_MEM_LAYOUT_PITCH_LINEAR,VPI_DATA_TYPE_UNSIGNED,WZYX,X8_Y8_Z8_W8) and VPI_IMAGE_FORMAT_RGBA8)
Error: VPI_ERROR_INVALID_ARGUMENT in nvsample_cudaprocess.cu at line 294 (Input and output images must have the same format)
Error: VPI_ERROR_INVALID_IMAGE_FORMAT in nvsample_cudaprocess.cu at line 297 (CUDA: Conversion not implemented between VPI_IMAGE_FORMAT_RGBA8 and VPIImageFormat(VPI_COLOR_MODEL_RGB,VPI_COLOR_SPEC_UNDEFINED,VPI_MEM_LAYOUT_PITCH_LINEAR,VPI_DATA_TYPE_UNSIGNED,WZYX,X8_Y8_Z8_W8))

If line 275 is changed to VPI_IMAGE_FORMAT_NV12 which would avoid unnecessary NV12->RGBA->NV12 for downstream, then

gst-launch-1.0 -v v4l2src device=/dev/video0 ! nvvidconv ! ‘video/x-raw(memory:NVMM), width=(int)1280, height=(int)960, framerate=(fraction)30/1, format=(string)NV12’ ! nvivafilter cuda-process=true customer-lib-name=“libnvsample_cudaprocess.so” ! ‘video/x-raw(memory:NVMM), format=(string)NV12’ ! nv3dsink -e

Error: VPI_ERROR_INVALID_IMAGE_FORMAT in nvsample_cudaprocess.cu at line 294 (Only image formats with full range are accepted, not VPI_IMAGE_FORMAT_NV12)

If both lines 291-297 are changed to VPI_BACKEND_VIC and line 275 to VPI_IMAGE_FORMAT_NV12 then it works but slowly. Modified the
nvsample_cudaprocess.cu (10.6 KB)
to take one time allocations outside gpu_process, but even then processing is 20ms per 1280x960, which is very slow, so doesn’t afford offloading dewarping to VIC. Should be around 3ms per 1920x1080 according to
VPI - Vision Programming Interface: Remap?

Thanks.

Hi,

Thanks for the feedback.

Is the default pipeline (without the VPI part) work in your environment?

$ gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),format=NV12' ! nvivafilter cuda-process=true pre-process

There are some issues in the nvivafilter library so you will need to update the library shared above.
We test this on r35.1, not sure if the library can work on r34 or not.

Thanks.

Hi,

The default pipeline with gstreamer nvarguscamerasrc unfortunately does not work since driver doesn’t on r35.1, hence curious how does IMX275 work since it still needs some adapter/driver for J509.

Reverting back to 34.1.1 and replacing libgstnvivafilter.so from r35.1, both CUDA and VIC backends do work, using remap instead of perspective warp nvsample_cudaprocess.cu (11.7 KB).

gst-launch-1.0 -v nvarguscamerasrc ! ‘video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, framerate=(fraction)30/1, format=(string)NV12’ ! nvivafilter cuda-process=true customer-lib-name=“libnvsample_cudaprocess.so” ! ‘video/x-raw(memory:NVMM), format=(string)NV12’ ! nv3dsink -e

However processing is still very slow ~17ms per 1920x1080 with VIC backend and ~14ms with CUDA backend, and even with simpler interpolation type or lower resolution. Comparing to expected according to https://docs.nvidia.com/vpi/algo_remap.html#algo_remap_perf, and comparing to observed cv::cuda::remap, it would appear that there is some unaccounted overhead that is much bigger than processing itself (3ms or 0.4ms). Since VPI operations are issued as stream transactions towards VIC, not sure how to profile the bottleneck?

Thanks.

I’m currently unable to try your case, but be sure to measure period after init time… You may discard measurements from first frames (say 20 frames) and compute for a minimum of 100 frames.

For monitoring VIC activity, you may have a look to sysfs:

sudo su

ls /sys/kernel/debug/vic
cat /sys/kernel/debug/vic/actmon_avg

ls /sys/kernel/debug/bpmp/debug/clk/ | grep vic
cat /sys/kernel/debug/bpmp/debug/clk/nafll_vic/rate

exit