Resources for custom nvivafilter for Jetpack 5.1.1 Jetson NX

Hello,

After an exhausting search in the forums and the rest of the internet, I found nothing regarding the documentation of customizing the nvivafilter. The fact that I am a beginner on cuda programming makes it even more frustrating.

I am trying to figure out how to customize the nvivafilter for alpha blending (per pixel method), where the alpha channels are constants stored in a file and they don’t need to be computed in the gstreamer pipeline. I found a few solutions to start with but I find it very difficult to eve begin. Can you provide me with the necessary resources and help me out here?

The following are the most relevant resources that I found that are relevant to my problem:

Thanks in advance,
Kind regards,
nikolas_h

Hi,
NvBufSurface APIs are more flexible, so we would suggest use the interface. You can start from the samples:

/usr/src/jetson_multimedia_api/samples

The function calls of getting GPU pointer is demonstrated in cuda_postprocess() in

/usr/src/jetson_multimedia_api/samples/12_v4l2_camera_cuda

If you prefer using nvivafilter plugin, please try to build this sample and give it a try:
Nvcompositor plugs alpha is not work and alpha plugs no work - #14 by DaneLLL

Hello DaneLLL,

It is important for my application to use the gstreamer features as well. So, from the resources that I found I concluded that I am restricted to use the nvivafilter plugin to achieve my objective. The samples in jetson_multimedia_api are for developers that do not intend to use gstreamer.

Am I missing something here?
Can I connect the jetson_multimedia_api (in case it is more straight forward to me) with gstreamer?

Also I’ve tried to run the 12_v4l2_camera_cuda example by copying the files at my home folder and then running make in the sample’s folder. After it finished successfully by running:

./v4l2_camera_cuda -s 640x480 -f UYVY -v

I get the following error so I cannot pass -c in the cmd command:

[ERROR] (NvEglRenderer.cpp:386) Could not get EglImage from fd. Not rendering

I have also tried your suggestion that you mentioned for the nvivafilter plugin, however, I have errors and I cannot move forward.

I have downloaded and unzipped the file in my home folder. When I run:

nvcc nvsample_cudaprocess.cu

I get the following error:

nvsample_cudaprocess.cu(47): error: identifier “BBOX” is undefined

nvsample_cudaprocess.cu(67): error: identifier “ColorFormat” is undefined

nvsample_cudaprocess.cu(76): error: identifier “COLOR_FORMAT_U8_V8” is undefined

nvsample_cudaprocess.cu(84): error: identifier “COLOR_FORMAT_RGBA” is undefined

nvsample_cudaprocess.cu(116): error: identifier “ColorFormat” is undefined

nvsample_cudaprocess.cu(127): error: identifier “COLOR_FORMAT_U8_V8” is undefined

nvsample_cudaprocess.cu(135): error: identifier “COLOR_FORMAT_RGBA” is undefined

nvsample_cudaprocess.cu(180): error: identifier “NUM_LOCATIONS” is undefined

nvsample_cudaprocess.cu(249): error: incomplete type is not allowed

nvsample_cudaprocess.cu(249): error: identifier “CustomerFunction” is undefined

nvsample_cudaprocess.cu(249): error: identifier “pFuncs” is undefined

nvsample_cudaprocess.cu(250): error: expected a “;”

At end of source: warning: parsing restarts here after previous syntax error

nvsample_cudaprocess.cu(62): warning: function “pre_process” was declared but never referenced

nvsample_cudaprocess.cu(111): warning: function “post_process” was declared but never referenced

nvsample_cudaprocess.cu(201): warning: function “gpu_process” was declared but never referenced

12 errors detected in the compilation of “nvsample_cudaprocess.cu”.

Hi,
Please download the default source code package:
https://developer.nvidia.com/embedded/jetson-linux-r3531

Driver Package (BSP) Sources
nvsample_cudaprocess_src.tbz2

And do development based on it.

This worked great, thank you! In the nvsample_cudaprocess.cu file, I see there are 3 kinds of methods that I can implement the alpha blending;

  1. pre_process
  2. gpu_process
  3. post_process

My intuition says that the gpu_process would be the fastest way to implement the alpha blending in my case. Considering that I already have the alpha channel and I just want to concatenate it to the received RGB image that I receive from the camera. What would be the most suitable case to achieve this?

I assume that I will have to store the alpha channel in the GPU cores and each time multiply it with the RGB or I just insert the already stored alpha channel in as the A value of RGBA?

Thank you,
nikolas_h

Hi,
If you would like to apply different alpha value to each pixel, please refer to the sample to multiply to RGB channels. If you would like to apply single alpha value to whole frame, you can consider use nvcompositor plugin. There’s a property for setting it in the plugin.

When I read the instructions in the folder made me confused. How many rectangles should they appear?

Also when I change the coordinates and compile again with make, nothing of my changes is applied.

Here is the readme file for reference:


/*
 * Copyright (c) 2016-2018, NVIDIA CORPORATION. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *  * Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *  * Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *  * Neither the name of NVIDIA CORPORATION nor the names of its
 *    contributors may be used to endorse or promote products derived
 *    from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
 * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
 * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

a) Install pre-requisites
========================

1. Install following packages on Jetson.
   sudo apt-get install libegl1-mesa-dev libgles2-mesa-dev libglvnd-dev

2. Install the NVIDIA(r) CUDA(r) toolkit. (e.g version 8.0)

   Download package <CUDA(r) Toolkit for L4T> from the following website:
         https://developer.nvidia.com/embedded/downloads
   Ensure that the package name is consistent with the Linux userspace.

   Extract the package with the following command:
   $ sudo dpkg -i <CUDA(r) Toolkit for L4T>

   Install the package with the following commands:
   $ sudo apt-get update
   $ sudo apt-get install cuda-toolkit-<version>

   NOTE: Use proper cuda toolkit version with above installation command. (for e.g. cuda-toolkit-8.0)

b) Build sample cuda sources
========================

   $ tar -xpf nvsample_cudaprocess_src.tbz2
   $ cd nvsample_cudaprocess
   $ make
   $ sudo mv libnvsample_cudaprocess.so /usr/lib/aarch64-linux-gnu/

Alternatively, set LD_LIBRARY_PATH as mentioned below instead of moving the library.
   $ export LD_LIBRARY_PATH=./

c) Run gst-launch-1.0 pipeline
========================

Pre-requisite for gstreamer-1.0: Install gstreamer-1.0 plugin using following command on Jetson

   sudo apt-get install gstreamer1.0-tools gstreamer1.0-alsa gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad gstreamer1.0-plugins-ugly gstreamer1.0-libav libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev libgstreamer-plugins-good1.0-dev

* Video decode pipeline:

   gst-launch-1.0 filesrc location=<filename.mp4> ! qtdemux ! h264parse ! omxh264dec ! nvivafilter cuda-process=true customer-lib-name="libnvsample_cudaprocess.so" ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! nvoverlaysink display-id=0 -e

* Camera capture pipeline:

   gst-launch-1.0 nvcamerasrc fpsRange="30.0 30.0" ! 'video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, format=(string)I420, framerate=(fraction)30/1' ! nvtee ! nvivafilter cuda-process=true customer-lib-name="libnvsample_cudaprocess.so" ! 'video/x-raw(memory:NVMM), format=(string)NV12' ! nvoverlaysink display-id=0 -e

NOTE: Make sure the video is larger than 96x96

d) Programming Guide
========================

1. Sample code in nvsample_cudaprocess_src package

    nvsample_cudaprocess.cu -> sample image pre-/post- processing, CUDA processing functions
                               pre-process: draw a 32x32 green block start at (0,0)
                               cuda-process: draw 32x32 green block start at (32,32)
                               post-process: draw 32x32 green block start at (64,64)

    customer_functions.h -> API definition

2. Image processing APIs
    a. Pre-/Post- processing
        i. input parameters:
            void ** sBaseAddr       : mapped pointers array point to different
                                      plane of image.

            unsigned int * smemsize : actually allocated memory size array for
                                      each image plane, no less than plane
                                      width * height.

            unsigned int * swidth   : width array for each image plane

            unsigned int * sheight  : height array for each image plane

            unsigned int * spitch   : actual line width array in memory for
                                      each image plane, no less than plane
                                      width

            ColorFormat * sformat   : color format array, i.e.,
                                      * NV12 image will have:
                                      sformat[0] = COLOR_FORMAT_Y8
                                      sformat[1] = COLOR_FORMAT_U8_V8
                                      * RGBA image will have:
                                      sformat[0] = COLOR_FORMAT_RGBA

            unsigned int nsurfcount : number of planes of current image type

            void ** userPtr         : point to customer allocated buffer in
                                      processing function

        ii. output parameters:
            none

    b. CUDA processing
        i. input parameters
            EGLImageKHR image : Input image data in EGLImage type
            void ** userPtr   : point to customer allocated buffer in
                                processing functions

    c. "init" function
        This function must be named "init", and accept a pointer to
        CustomerFunction structure, which contains 3 function pointers point to
        pre-processing, cuda-processing, and post-processing respectively, for
        details, please refer to customer_functions.h and nvsample_cudaprocess.cu

    d. "deinit" function
        This function must be named "deinit", and is called when the pipeline is
        stopping

    e. notes
        a customer processing lib:
            MUST have an "init" function, which set correspond functions to
                nvivafilter plugin;
            MAY have a pre-processing function, if not implemented, set to NULL
                in "init" function;
            MAY have a cuda-processing function, if not imeplemented, set to
                NULL in "init" function;
            MAY have a post-processing function, if not implemented, set to NULL
                in "init" function.
            MAY have an "deinit" function if customer functions need to do
                deinitialization in stopping the pipeline

3. Processing Steps
    a. nvivafilter plugin input and output
        input : (I420, NV12) NVMM buffer, it's NVIDIA's internal frame format, maybe
                pitch linear or block linear layout.
        output: (NV12, RGBA) NVMM buffer, layout transformed from block linear to pitch linear,
                processed result could inplace stored into this buffer.

    b. nvivafilter plugin properties
        i.   customer-lib-name
            string: absolute path and .so lib name to your lib or just the .so
            lib name if it is in dynamic lib search path.

        ii.  pre-process
            bool: dynamically control whether do pre-process if pre-process
                  function is implemented and set to plugin

        iii. cuda-process
            bool: dynamically control whether to do cuda-process if
                  cuda-process function is implemented and set to plugin

        iv.  post-process
            bool: dynamically control whether to do post-process if
                  post-process function is implemented and set to plugin

    c. processing order
        customer processing functions will be invoked strictly at following
        order if they are implemented and set:
            pre-processing -> cuda-processing -> post-processing
        plugin property pre-process/cuda-process/post-process can be used for
        dynamic enable/disable processing respectively.

Edit: I managed to figure out that I have to pass the pre-process, cuda-process and post-process as true in nvivafilter to work. However, the problem still remains that I cannot change the coordinates of the green block. I tried to change the BOX_H, BOX_W, COORD_X, COORD_Y then I run make in terminal and run again the gstreamer command and the changes do not apply. Can you please show me how to debug this so I can move on ?

Hi,
Please refer to the sample in
Nvcompositor plugs alpha is not work and alpha plugs no work - #14 by DaneLLL

The variable is the alpha value:

static char dat = 0;

It is applied to R,G,B channels in

__global__ void addLabelsKernel(int* pDevPtr, int pitch,int height, char dat){
  int row = blockIdx.y*blockDim.y + threadIdx.y;
  int col = blockIdx.x*blockDim.x + threadIdx.x;
  if (col <= pitch && row <= height && (col % 4) < 3) {
    char * pElement = (char*)pDevPtr + row * pitch + col;
    int scaled = (int)(*pElement)*(int)dat / 256;
    pElement[0] = (char)scaled;
  }
  return;
}

Please try to successfully run this sample first, and then you can refer to it for further customization.

Hello,

I have been experimenting a bit with the script. I think I got it to work.

I run the and I got the following:

gst-launch-1.0 videotestsrc pattern=snow ! video/x-raw,width=1280,height=720,framerate=30/1 ! queue ! comp.sink_0 videotestsrc ! video/x-raw,width=640,height=480,framerate=30/1 ! nvvidconv ! ‘video/x-raw(memory:NVMM)’ ! nvivafilter cuda-process=1 customer-lib-name=./libnvsample_cudaprocess.so ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvvidconv ! queue ! comp.sink_1 compositor name=comp sink_0::xpos=0 sink_0::ypos=0 sink_0::zorder=1 sink_1::xpos=0 sink_1::ypos=0 sink_1::zorder=2 ! videoconvert ! xvimagesink

After modifying the script with your suggestion, by running the same command as before I get the following:

Edit:
What exactly is pElement, pitch and pDevPtr?

Hi,
*pElement is the value of the R/G/B channel. In the if condition, it finds out the R/G/B channel:

if (col <= pitch && row <= height && (col % 4) < 3)

pitch is the buffer pitch/stride. If there is no additional alignment, it is width*4 for RGBA. Each channel has one byte so one line is width*4 bytes

pDevPtr is CUDA pointer to the buffer.

Thank you for your answer!

If the pitch = 8 and pDevPtr = 2560, then what is the sizes in the case of 640x480 image?

As I mentioned I already have got the alpha channel of constant values that will not change during the process of the pipeline (each pixel value differs from each other).

I haven’t found a way to load the alpha channel file (gray-scale image of pbm format), so I decided to compute it in the nvivafilter once and allocate it to the gpu memory so that whenever a frame comes in it will replace the values of alpha to those computed values. Is this an efficient way? Can it be done?

if yes, how do I declare something to be computed only once in the nvivafilter and allocate it somewhere in gpu or cpu memory to be called when needed?

From this Creating Cuda Filter for GStreamer with RGBA IN/OUT and Zero Copy - #6 by Tom_Bond, it seems that pre-process and post-process are only called once. Therefore, it seems that for my application is more suitable to work with the pre-process function. However, when I do

printf(“Hello World”)

in the pre-process function it seems that it is called every time and not just once.

Hi,
Do you mean your source is a singe image? If it is live source such as a camera generating frame data in 30 fps, there are 30 frames per second and each frame does not have the alpha effect. You have to apply the effect to each frame so that the effect is consistently shown in camera preview.

My source is a camera and the alpha channel is a single grayscale image that needs to be applied to each frame using the nvivafilter but all that is needed is just to copy the values of the grayscale that are allocated somewhere on either cpu or gpu.

The major problem that I am facing now is how to allocate this memory without freeing it in the next frame or the next time nvivafilter is called in the pipeline.

Hi,
A possible solution is to allocate a CUDA buffer and copy the alpha values to the buffer. So that it can be read in GPU cores and apply to B/G/R channels. Please refer to the attached patch:
nvsample_cudaprocess.zip (3.0 KB)

And try the command:

$ gst-launch-1.0 videotestsrc num-buffers=66 is-live=1 ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12,width=1920,height=1080' ! nvivafilter cuda-process=true customer-lib-name="libnvsample_cudaprocess.so" ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! nvv4l2h265enc ! h265parse ! matroskamux ! filesink location=a.mkv

For your use-case, you can replace the following code by reading data from the grayscale image:

  cur = (char *)p_temp;
  // assign each pixel a alpha value
  for (i = 0; i < HEI; i++)
  {
    for (j = 0; j < WID; j++)
    {
      *cur = i % 256;
      cur++;
    }
  }
1 Like

Thank you for your helpful response and this is a very useful answer!

Do you know why the pixels with low alpha value appear black instead of transparent? Similar example is the images that I posted here

Hi,
In the demo patch, the RGB channels are changed like:

  if (col <= pitch && row <= height && (col % 4) < 3) {
    if ((col >> 2) <= WID)
    {
      char * pElement = (char*)pDevPtr + row * pitch + col;
      char * pAlpha =  (char*)pBufPtr + row * WID + (col >> 2);
      int scaled = (int)(*pElement)*(int)(*pAlpha) / 256;
      pElement[0] = (char)scaled;
    }
  }

If alpha is 0, the pixel will be R=0x0, G=0x0 , B=0x0. For being transparent, it should need a background color to modify RGB channels to be

alpha*channel_color + (1-alpha)*background_color

The way I understand it is that in the 1st IF statement you get the RGB values out of RGBA. If (col % 4) < 3 is the RGB then the (col % 4) < 3 is the alpha channel or something else that is not needed? Quite confusing.

Thereon, you apply the following to the col values where they result in less than or equal to WID, when they are bit-shifted left by 2 (2nd IF):
– You locate the alpha on the cuda allocated memory and apply the alpha value scaled to [0,1].

Where does the operation (R=0x0, G=0x0 , B=0x0) take place?

I modified your example to add a background using compositor but it doesn’t work as expected. Can you please confirm this as well?

Is there an alternative way to add a background ?

$ gst-launch-1.0 videotestsrc num-buffers=66 is-live=1 ! video/x-raw,width=1920,height=1080,framerate=30/1 ! comp.sink_0 videotestsrc pattern=snow num-buffers=66 is-live=1 ! video/x-raw,width=1920,height=1080,framerate=30/1 ! nvvidconv ! ‘video/x-raw(memory:NVMM)’ ! nvivafilter cuda-process=true customer-lib-name=./libnvsample_cudaprocess.so ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvvidconv ! queue ! comp.sink_1 compositor name=comp sink_0::xpos=0 sink_0::ypos=0 sink_0::zorder=1 sink_1::xpos=0 sink_1::ypos=0 sink_1::zorder=2 ! nvvidconv ! ‘video/x-raw(memory:NVMM),format=NV12’ ! nvv4l2h265enc ! h265parse ! matroskamux ! filesink location=b.mkv

Hi,
For RGBA data, each pixel has 4 bytes in one byte R, one byte G, one byte B, and one byte A. (col % 4) < 3 is to get the R, G, and B bytes.

For adding background color, please implement it in nvsample_cudaprocess.cu. This is public code and please customize it to your use-case.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.