Resources for custom nvivafilter for Jetpack 5.1.1 Jetson NX

nikolas_h · December 11, 2023, 6:13pm

Hello,

After an exhausting search in the forums and the rest of the internet, I found nothing regarding the documentation of customizing the nvivafilter. The fact that I am a beginner on cuda programming makes it even more frustrating.

I am trying to figure out how to customize the nvivafilter for alpha blending (per pixel method), where the alpha channels are constants stored in a file and they don’t need to be computed in the gstreamer pipeline. I found a few solutions to start with but I find it very difficult to eve begin. Can you provide me with the necessary resources and help me out here?

The following are the most relevant resources that I found that are relevant to my problem:

Thanks in advance,
Kind regards,
nikolas_h

DaneLLL · December 12, 2023, 1:31am

Hi,
NvBufSurface APIs are more flexible, so we would suggest use the interface. You can start from the samples:

/usr/src/jetson_multimedia_api/samples

The function calls of getting GPU pointer is demonstrated in cuda_postprocess() in

/usr/src/jetson_multimedia_api/samples/12_v4l2_camera_cuda

If you prefer using nvivafilter plugin, please try to build this sample and give it a try:
Nvcompositor plugs alpha is not work and alpha plugs no work - #14 by DaneLLL

nikolas_h · December 13, 2023, 10:58am

Hello DaneLLL,

It is important for my application to use the gstreamer features as well. So, from the resources that I found I concluded that I am restricted to use the nvivafilter plugin to achieve my objective. The samples in jetson_multimedia_api are for developers that do not intend to use gstreamer.

Am I missing something here?
Can I connect the jetson_multimedia_api (in case it is more straight forward to me) with gstreamer?

Also I’ve tried to run the 12_v4l2_camera_cuda example by copying the files at my home folder and then running make in the sample’s folder. After it finished successfully by running:

./v4l2_camera_cuda -s 640x480 -f UYVY -v

I get the following error so I cannot pass -c in the cmd command:

[ERROR] (NvEglRenderer.cpp:386) Could not get EglImage from fd. Not rendering

nikolas_h · December 13, 2023, 11:06am

I have also tried your suggestion that you mentioned for the nvivafilter plugin, however, I have errors and I cannot move forward.

I have downloaded and unzipped the file in my home folder. When I run:

nvcc nvsample_cudaprocess.cu

I get the following error:

nvsample_cudaprocess.cu(47): error: identifier “BBOX” is undefined

nvsample_cudaprocess.cu(67): error: identifier “ColorFormat” is undefined

nvsample_cudaprocess.cu(76): error: identifier “COLOR_FORMAT_U8_V8” is undefined

nvsample_cudaprocess.cu(84): error: identifier “COLOR_FORMAT_RGBA” is undefined

nvsample_cudaprocess.cu(116): error: identifier “ColorFormat” is undefined

nvsample_cudaprocess.cu(127): error: identifier “COLOR_FORMAT_U8_V8” is undefined

nvsample_cudaprocess.cu(135): error: identifier “COLOR_FORMAT_RGBA” is undefined

nvsample_cudaprocess.cu(180): error: identifier “NUM_LOCATIONS” is undefined

nvsample_cudaprocess.cu(249): error: incomplete type is not allowed

nvsample_cudaprocess.cu(249): error: identifier “CustomerFunction” is undefined

nvsample_cudaprocess.cu(249): error: identifier “pFuncs” is undefined

nvsample_cudaprocess.cu(250): error: expected a “;”

At end of source: warning: parsing restarts here after previous syntax error

nvsample_cudaprocess.cu(62): warning: function “pre_process” was declared but never referenced

nvsample_cudaprocess.cu(111): warning: function “post_process” was declared but never referenced

nvsample_cudaprocess.cu(201): warning: function “gpu_process” was declared but never referenced

12 errors detected in the compilation of “nvsample_cudaprocess.cu”.

DaneLLL · December 13, 2023, 11:19am

Hi,
Please download the default source code package:
https://developer.nvidia.com/embedded/jetson-linux-r3531

Driver Package (BSP) Sources
nvsample_cudaprocess_src.tbz2

And do development based on it.

nikolas_h · December 15, 2023, 1:02pm

This worked great, thank you! In the nvsample_cudaprocess.cu file, I see there are 3 kinds of methods that I can implement the alpha blending;

pre_process
gpu_process
post_process

My intuition says that the gpu_process would be the fastest way to implement the alpha blending in my case. Considering that I already have the alpha channel and I just want to concatenate it to the received RGB image that I receive from the camera. What would be the most suitable case to achieve this?

I assume that I will have to store the alpha channel in the GPU cores and each time multiply it with the RGB or I just insert the already stored alpha channel in as the A value of RGBA?

Thank you,
nikolas_h

DaneLLL · December 15, 2023, 10:34pm

Hi,
If you would like to apply different alpha value to each pixel, please refer to the sample to multiply to RGB channels. If you would like to apply single alpha value to whole frame, you can consider use nvcompositor plugin. There’s a property for setting it in the plugin.

nikolas_h · December 18, 2023, 7:09pm

When I read the instructions in the folder made me confused. How many rectangles should they appear?

Also when I change the coordinates and compile again with make, nothing of my changes is applied.

Here is the readme file for reference:


/*
 * Copyright (c) 2016-2018, NVIDIA CORPORATION. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *  * Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *  * Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *  * Neither the name of NVIDIA CORPORATION nor the names of its
 *    contributors may be used to endorse or promote products derived
 *    from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
 * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
 * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

a) Install pre-requisites
========================

1. Install following packages on Jetson.
   sudo apt-get install libegl1-mesa-dev libgles2-mesa-dev libglvnd-dev

2. Install the NVIDIA(r) CUDA(r) toolkit. (e.g version 8.0)

   Download package <CUDA(r) Toolkit for L4T> from the following website:
         https://developer.nvidia.com/embedded/downloads
   Ensure that the package name is consistent with the Linux userspace.

   Extract the package with the following command:
   $ sudo dpkg -i <CUDA(r) Toolkit for L4T>

   Install the package with the following commands:
   $ sudo apt-get update
   $ sudo apt-get install cuda-toolkit-<version>

   NOTE: Use proper cuda toolkit version with above installation command. (for e.g. cuda-toolkit-8.0)

b) Build sample cuda sources
========================

   $ tar -xpf nvsample_cudaprocess_src.tbz2
   $ cd nvsample_cudaprocess
   $ make
   $ sudo mv libnvsample_cudaprocess.so /usr/lib/aarch64-linux-gnu/

Alternatively, set LD_LIBRARY_PATH as mentioned below instead of moving the library.
   $ export LD_LIBRARY_PATH=./

c) Run gst-launch-1.0 pipeline
========================

Pre-requisite for gstreamer-1.0: Install gstreamer-1.0 plugin using following command on Jetson

   sudo apt-get install gstreamer1.0-tools gstreamer1.0-alsa gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev libgstreamer-plugins-good1.0-dev

* Video decode pipeline:

   gst-launch-1.0 filesrc location=<filename.mp4> ! qtdemux ! h264parse ! omxh264dec ! nvivafilter cuda-process=true customer-lib-name="libnvsample_cudaprocess.so" ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! nvoverlaysink display-id=0 -e

* Camera capture pipeline:

   gst-launch-1.0 nvcamerasrc fpsRange="30.0 30.0" ! 'video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, format=(string)I420, framerate=(fraction)30/1' ! nvtee ! nvivafilter cuda-process=true customer-lib-name="libnvsample_cudaprocess.so" ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! nvoverlaysink display-id=0 -e

NOTE: Make sure the video is larger than 96x96

d) Programming Guide
========================

1. Sample code in nvsample_cudaprocess_src package

    nvsample_cudaprocess.cu -> sample image pre-/post- processing, CUDA processing functions
                               pre-process: draw a 32x32 green block start at (0,0)
                               cuda-process: draw 32x32 green block start at (32,32)
                               post-process: draw 32x32 green block start at (64,64)

    customer_functions.h -> API definition

2. Image processing APIs
    a. Pre-/Post- processing
        i. input parameters:
            void ** sBaseAddr       : mapped pointers array point to different
                                      plane of image.

            unsigned int * smemsize : actually allocated memory size array for
                                      each image plane, no less than plane
                                      width * height.

            unsigned int * swidth   : width array for each image plane

            unsigned int * sheight  : height array for each image plane

            unsigned int * spitch   : actual line width array in memory for
                                      each image plane, no less than plane
                                      width

            ColorFormat * sformat   : color format array, i.e.,
                                      * NV12 image will have:
                                      sformat[0] = COLOR_FORMAT_Y8
                                      sformat[1] = COLOR_FORMAT_U8_V8
                                      * RGBA image will have:
                                      sformat[0] = COLOR_FORMAT_RGBA

            unsigned int nsurfcount : number of planes of current image type

            void ** userPtr         : point to customer allocated buffer in
                                      processing function

        ii. output parameters:
            none

    b. CUDA processing
        i. input parameters
            EGLImageKHR image : Input image data in EGLImage type
            void ** userPtr   : point to customer allocated buffer in
                                processing functions

    c. "init" function
        This function must be named "init", and accept a pointer to
        CustomerFunction structure, which contains 3 function pointers point to
        pre-processing, cuda-processing, and post-processing respectively, for
        details, please refer to customer_functions.h and nvsample_cudaprocess.cu

    d. "deinit" function
        This function must be named "deinit", and is called when the pipeline is
        stopping

    e. notes
        a customer processing lib:
            MUST have an "init" function, which set correspond functions to
                nvivafilter plugin;
            MAY have a pre-processing function, if not implemented, set to NULL
                in "init" function;
            MAY have a cuda-processing function, if not imeplemented, set to
                NULL in "init" function;
            MAY have a post-processing function, if not implemented, set to NULL
                in "init" function.
            MAY have an "deinit" function if customer functions need to do
                deinitialization in stopping the pipeline

3. Processing Steps
    a. nvivafilter plugin input and output
        input : (I420, NV12) NVMM buffer, it's NVIDIA's internal frame format, maybe
                pitch linear or block linear layout.
        output: (NV12, RGBA) NVMM buffer, layout transformed from block linear to pitch linear,
                processed result could inplace stored into this buffer.

    b. nvivafilter plugin properties
        i.   customer-lib-name
            string: absolute path and .so lib name to your lib or just the .so
            lib name if it is in dynamic lib search path.

        ii.  pre-process
            bool: dynamically control whether do pre-process if pre-process
                  function is implemented and set to plugin

        iii. cuda-process
            bool: dynamically control whether to do cuda-process if
                  cuda-process function is implemented and set to plugin

        iv.  post-process
            bool: dynamically control whether to do post-process if
                  post-process function is implemented and set to plugin

    c. processing order
        customer processing functions will be invoked strictly at following
        order if they are implemented and set:
            pre-processing -> cuda-processing -> post-processing
        plugin property pre-process/cuda-process/post-process can be used for
        dynamic enable/disable processing respectively.

Edit: I managed to figure out that I have to pass the pre-process, cuda-process and post-process as true in nvivafilter to work. However, the problem still remains that I cannot change the coordinates of the green block. I tried to change the BOX_H, BOX_W, COORD_X, COORD_Y then I run make in terminal and run again the gstreamer command and the changes do not apply. Can you please show me how to debug this so I can move on ?

DaneLLL · December 19, 2023, 5:52am

Hi,
Please refer to the sample in
Nvcompositor plugs alpha is not work and alpha plugs no work - #14 by DaneLLL

The variable is the alpha value:

static char dat = 0;

It is applied to R,G,B channels in

__global__ void addLabelsKernel(int* pDevPtr, int pitch,int height, char dat){
  int row = blockIdx.y*blockDim.y + threadIdx.y;
  int col = blockIdx.x*blockDim.x + threadIdx.x;
  if (col <= pitch && row <= height && (col % 4) < 3) {
    char * pElement = (char*)pDevPtr + row * pitch + col;
    int scaled = (int)(*pElement)*(int)dat / 256;
    pElement[0] = (char)scaled;
  }
  return;
}

Please try to successfully run this sample first, and then you can refer to it for further customization.

nikolas_h · December 20, 2023, 5:15pm

Hello,

I have been experimenting a bit with the script. I think I got it to work.

I run the and I got the following:

gst-launch-1.0 videotestsrc pattern=snow ! video/x-raw,width=1280,height=720,framerate=30/1 ! queue ! comp.sink_0 videotestsrc ! video/x-raw,width=640,height=480,framerate=30/1 ! nvvidconv ! ‘video/x-raw(memory:NVMM)’ ! nvivafilter cuda-process=1 customer-lib-name=./libnvsample_cudaprocess.so ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvvidconv ! queue ! comp.sink_1 compositor name=comp sink_0::xpos=0 sink_0::ypos=0 sink_0::zorder=1 sink_1::xpos=0 sink_1::ypos=0 sink_1::zorder=2 ! videoconvert ! xvimagesink

After modifying the script with your suggestion, by running the same command as before I get the following:

Edit:
What exactly is pElement, pitch and pDevPtr?

DaneLLL · December 25, 2023, 4:21am

Hi,
*pElement is the value of the R/G/B channel. In the if condition, it finds out the R/G/B channel:

if (col <= pitch && row <= height && (col % 4) < 3)

pitch is the buffer pitch/stride. If there is no additional alignment, it is width*4 for RGBA. Each channel has one byte so one line is width*4 bytes

pDevPtr is CUDA pointer to the buffer.

nikolas_h · December 27, 2023, 4:00pm

Thank you for your answer!

If the pitch = 8 and pDevPtr = 2560, then what is the sizes in the case of 640x480 image?

As I mentioned I already have got the alpha channel of constant values that will not change during the process of the pipeline (each pixel value differs from each other).

I haven’t found a way to load the alpha channel file (gray-scale image of pbm format), so I decided to compute it in the nvivafilter once and allocate it to the gpu memory so that whenever a frame comes in it will replace the values of alpha to those computed values. Is this an efficient way? Can it be done?

if yes, how do I declare something to be computed only once in the nvivafilter and allocate it somewhere in gpu or cpu memory to be called when needed?

From this Creating Cuda Filter for GStreamer with RGBA IN/OUT and Zero Copy - #6 by Tom_Bond, it seems that pre-process and post-process are only called once. Therefore, it seems that for my application is more suitable to work with the pre-process function. However, when I do

printf(“Hello World”)

in the pre-process function it seems that it is called every time and not just once.

DaneLLL · December 28, 2023, 2:28am

Hi,
Do you mean your source is a singe image? If it is live source such as a camera generating frame data in 30 fps, there are 30 frames per second and each frame does not have the alpha effect. You have to apply the effect to each frame so that the effect is consistently shown in camera preview.

nikolas_h · December 28, 2023, 7:54am

My source is a camera and the alpha channel is a single grayscale image that needs to be applied to each frame using the nvivafilter but all that is needed is just to copy the values of the grayscale that are allocated somewhere on either cpu or gpu.

The major problem that I am facing now is how to allocate this memory without freeing it in the next frame or the next time nvivafilter is called in the pipeline.

DaneLLL · December 29, 2023, 6:16am

Hi,
A possible solution is to allocate a CUDA buffer and copy the alpha values to the buffer. So that it can be read in GPU cores and apply to B/G/R channels. Please refer to the attached patch:
nvsample_cudaprocess.zip (3.0 KB)

And try the command:

$ gst-launch-1.0 videotestsrc num-buffers=66 is-live=1 ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080' ! nvivafilter cuda-process=true customer-lib-name="libnvsample_cudaprocess.so" ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! nvv4l2h265enc ! h265parse ! matroskamux ! filesink location=a.mkv

For your use-case, you can replace the following code by reading data from the grayscale image:

  cur = (char *)p_temp;
  // assign each pixel a alpha value
  for (i = 0; i < HEI; i++)
  {
    for (j = 0; j < WID; j++)
    {
      *cur = i % 256;
      cur++;
    }
  }

nikolas_h · January 2, 2024, 10:49am

Thank you for your helpful response and this is a very useful answer!

Do you know why the pixels with low alpha value appear black instead of transparent? Similar example is the images that I posted here

DaneLLL · January 3, 2024, 2:42am

Hi,
In the demo patch, the RGB channels are changed like:

  if (col <= pitch && row <= height && (col % 4) < 3) {
    if ((col >> 2) <= WID)
    {
      char * pElement = (char*)pDevPtr + row * pitch + col;
      char * pAlpha =  (char*)pBufPtr + row * WID + (col >> 2);
      int scaled = (int)(*pElement)*(int)(*pAlpha) / 256;
      pElement[0] = (char)scaled;
    }
  }

If alpha is 0, the pixel will be R=0x0, G=0x0 , B=0x0. For being transparent, it should need a background color to modify RGB channels to be

alpha*channel_color + (1-alpha)*background_color

nikolas_h · January 3, 2024, 2:36pm

The way I understand it is that in the 1st IF statement you get the RGB values out of RGBA. If (col % 4) < 3 is the RGB then the (col % 4) < 3 is the alpha channel or something else that is not needed? Quite confusing.

Thereon, you apply the following to the col values where they result in less than or equal to WID, when they are bit-shifted left by 2 (2nd IF):
– You locate the alpha on the cuda allocated memory and apply the alpha value scaled to [0,1].

Where does the operation (R=0x0, G=0x0 , B=0x0) take place?

I modified your example to add a background using compositor but it doesn’t work as expected. Can you please confirm this as well?

Is there an alternative way to add a background ?

$ gst-launch-1.0 videotestsrc num-buffers=66 is-live=1 ! video/x-raw,width=1920,height=1080,framerate=30/1 ! comp.sink_0 videotestsrc pattern=snow num-buffers=66 is-live=1 ! video/x-raw,width=1920,height=1080,framerate=30/1 ! nvvidconv ! ‘video/x-raw(memory:NVMM)’ ! nvivafilter cuda-process=true customer-lib-name=./libnvsample_cudaprocess.so ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvvidconv ! queue ! comp.sink_1 compositor name=comp sink_0::xpos=0 sink_0::ypos=0 sink_0::zorder=1 sink_1::xpos=0 sink_1::ypos=0 sink_1::zorder=2 ! nvvidconv ! ‘video/x-raw(memory:NVMM),format=NV12’ ! nvv4l2h265enc ! h265parse ! matroskamux ! filesink location=b.mkv

DaneLLL · January 4, 2024, 1:29am

Hi,
For RGBA data, each pixel has 4 bytes in one byte R, one byte G, one byte B, and one byte A. (col % 4) < 3 is to get the R, G, and B bytes.

For adding background color, please implement it in nvsample_cudaprocess.cu. This is public code and please customize it to your use-case.

Topic		Replies	Views
Nvcompositor plugs alpha is not work and alpha plugs no work Jetson AGX Orin gstreamer	24	2729	November 9, 2022
No Chroma Keying with Gstreamer Alpha Plugin Jetson Nano	28	3455	October 15, 2021
Gstreamer alpha plugin does not work Jetson TX2	20	2719	October 18, 2021
Transparent overlays Jetson TX2 gstreamer , nvbugs	19	3232	January 17, 2023
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1950	January 2, 2024
How to use nvivafilter gstreamer plugin on Xavier ? Jetson AGX Xavier	7	1572	October 18, 2021
Nvivafilter EOS C Jetson Nano gstreamer	28	2102	December 28, 2022
nvivafilter customer-lib implementation (code available) Jetson AGX Xavier	7	1960	October 18, 2021
Stream from Camera, edit frame and save as mp4 Jetson Nano mmapi	10	2359	October 15, 2021
Get frame in GpuMat instead of Mat - OpenCV 3.4.2 - v4l2 - Jetson TX2 Jetson TX2	25	4466	October 18, 2021

Resources for custom nvivafilter for Jetpack 5.1.1 Jetson NX

Related topics