EglImage (or CUeglFrame) preprocessing

Hi, is there any example I can follow to learn how to do image preprocessing on an EglImage extracted by Argus API? I’m trying to follow the jetson_multimedia_api sample 04_video_dec_trt. I saw in the code the EglImage is firstly converted to CUeglFrame and then converted to int for trt cuda stream to process. However, in our use case, before we feed the image into the trt model, we have some preprocessing like image downsample, color space conversion and normalization etc. I’m wondering where and how I can add those preprocessing? Any advice here? Thanks

I saw one way referenced here is convert the EglFrame to opencv first and then process. But it seems this is not optimal. Any other suggestions? For context, I couldn’t use gstreamer and deepstream for legacy issue. Currently our program is a custom c++ program.

Hi,
Please refer to cuda_postprocess() in 12_camera_v4l2_cuda. You can get CUDA pointer and implement CUDA code for processing the buffers on GPU.

Got it, thanks! This I have figured out. Do you have some examples of how to do CUDA image processing? Do I have to write all processing kernels myself? Are there any open libraries that I can leverage? The requirements from my program are very simple, I just need resizing, normalization and cropping.

Hi,
There are some CUDA code in
jetson-utils/cuda at def4a04d023960781a44f9cd97fd1464093becf0 · dusty-nv/jetson-utils · GitHub

Please take a look and see if it can be applied to your use-case. And for resizing/cropping, you may call NvBufferTransform() to use hardware converter.

Thanks, i’ll take a look first and keep you posted if I get new questions.

Hi, I’m using the below code to convert the dma_fd to CUeglFrame following the example in cuda_postprocess.

                auto fd = dmabuf_fd;
                EGLImageKHR egl_image = NULL;
                egl_image = NvEGLImageFromFd(eglDisplay, fd);
                if (egl_image == NULL) {
                    fprintf(stderr, "Error while mapping dmabuf fd (0x%X) to EGLImage\n", fd);
                }


                CUresult status;
                CUeglFrame eglFrame;
                CUgraphicsResource pResource = NULL;

                cudaFree(0);
                status = cuGraphicsEGLRegisterImage(&pResource, egl_image,
                                                    CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
                if (status != CUDA_SUCCESS) {
                    printf("cuGraphicsEGLRegisterImage failed: %d, cuda process stop\n",
                           status);
                }

                status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
                if (status != CUDA_SUCCESS) {
                    printf("cuGraphicsSubResourceGetMappedArray failed\n");
                }

                status = cuCtxSynchronize();
                if (status != CUDA_SUCCESS) {
                    printf("cuCtxSynchronize failed\n");
                }

when the program started, everything works well. However, if I leave it for several mins, it starts to raise the below errors:

cuGraphicsEGLRegisterImage failed: 1, cuda process stop
cuGraphicsSubResourceGetMappedArray failed

Not sure how I can resolve it. I tried to slow down the CUDA conversion after I receive this error by skipping some frames but once this happened, no matter how slow I did this, it would just keep on raising the same error. However, if I directly use the renderer to render the dma_fd, I ran it for a whole night and it’s still working. No error at all and the render on screen still works well.

Hi,
Not sure but probably you miss to call cuGraphicsUnregisterResource() after the processing?

Nice catch, that’s my bad. Thanks!

Hi, I have successfully used NvBufferTransform to do resizing and cropping. We also need to do other two processing.

  1. normalization
  2. image rotation with random degree

I checked jetson_utils. But i haven’t been able to successfully use that. after I build the repo and install it. The c++ example can run but the python example will directly fail with a “Segmentation Fault” when import jetson.utils. I don’t know how to resolve that. When I’m checking the python example source code, I saw the images memory are allocated with cudaHostAlloc and cudaHostGetDevicePointer and then passed into each processing function to get processed. My question here is since I’m streaming the image with NvBuffer and what i got is a dma_fd. How that can be further processed as the examples in jetson-utils? One common confusion is that i know currently my NvBuffer contains pitch, but when I looked at the cuda kernel functions in jetson-utils, it seems it’s not considering any of that? then what should I do to handle it?

I know it’s a relatively broad question but if you can provide some example directly for me to do normalize and image rotate (doesn’t have to use jetson-utils) that can be helpful. I feel this is a common need for other applications too. I think normalization can be easier cus ultimate I can also embed this step into the trt model. But image rotation I think is tricky.

Hi,
The two functions(normalization and rotation in random degree) are not supported in NvBufferTranform(). Rotation is supported but it is in fixed degree: 90, 180, 270, mirroring.

And NvBuffer cannot be passed to jetson-utils

Are the required functions in jetson-utils implemented through CUDA? If yes, you may move the code to

/uar/arc/jetson_multimedia_api/samples/common/algorithm/cuda

to build with jetson_multimedia_api

it’s implemented via CUDA. i can move and try. But theoretically, if NvBuffer contains pitch, it requires some special operations correct? currently the implementation in jetson-utils I saw are naive implementation assuming the valid width of the image is the same as the pitch. I’m not sure if there is anything smarter here. If you can, could you help quickly check this link to see if it supports pitch? jetson-utils/cudaNormalize.cu at def4a04d023960781a44f9cd97fd1464093becf0 · dusty-nv/jetson-utils · GitHub

Hi,
The possible solution would be like:

  1. Have NvBuffer in uint8 RGBA
  2. Create CUDA buffer in float RGBA and convert NvBuffer in uint8 RGBA to CUDA buffer in float RGBA
  3. Call the function for doing normalization

In NvCudaProc.cpp, there is code of converting NvBuffer in uint8 RGBA to individual CUDA buffers in float R/G/B planes. You may refer to it.

Thanks, this is helpful. Let me try it out.

Hi, I’m looking at the function in NvCudaProc. Currently i’m having difficulties to understand what does it mean for the offset and scales param in the below function.

convertIntToFloat((CUdeviceptr) eglFrame.frame.pPitch[0],
                          width,
                          height,
                          eglFrame.pitch,
                          color_format,
                          offsets,
                          scales,
                          cuda_buf);

Can you help explain a bit about what do they mean? I also looked at the source code of NvAnalysis.cu for this function but it directly called a CUDA kernel with below, i’m not too sure what’s those tow fields mean here.

__global__ void
convertIntToFloatKernelBGR(CUdeviceptr pDevPtr, int width, int height,
                void* cuda_buf, int pitch, void* offsets_gpu, void* scales_gpu)
{
    float *pdata = (float *)cuda_buf;
    char *psrcdata = (char *)pDevPtr;
    int *offsets = (int *)offsets_gpu;
    float *scales = (float *)scales_gpu;
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;


    if (col < width && row < height)
    {
        // For V4L2_PIX_FMT_ABGR32 --> BGRA-8-8-8-8
        for (int k = 0; k < 3; k++)
        {
            pdata[width * height * k + row * width + col] =
                (float)(*(psrcdata + row * pitch + col * 4 + k) - offsets[k]) * scales[k];
        }
    }
}

========================= updates ==========================

Sorry, you can ignore this question, after some dig in, I found this below in trt_inference.h

struct {
        const int  classCnt;
        float      THRESHOLD[3];
        const char *INPUT_BLOB_NAME;
        const char *OUTPUT_BLOB_NAME;
        const char *OUTPUT_BBOX_NAME;
        const int  STRIDE;
        const int  WORKSPACE_SIZE;
        int        offsets[3];
        float      input_scale[3];
        float      bbox_output_scales[4];
        const int  ParseFunc_ID;
    } *g_pModelNetAttr, gModelNetAttr[4] = {
        {
            // GOOGLENET_SINGLE_CLASS
            1,
            {0.8, 0, 0},
            "data",
            "coverage",
            "bboxes",
            4,
            450 * 1024 * 1024,
            {0, 0, 0},
            {1.0f, 1.0f, 1.0f},
            {1, 1, 1, 1},
            0
        },

        {
            // GOOGLENET_THREE_CLASS
            3,
            {0.6, 0.6, 1.0},   //People, Motorbike, Car
            "data",
            "Layer16_cov",
            "Layer16_bbox",
            16,
            110 * 1024 * 1024,
            {124, 117, 104},
            {1.0f, 1.0f, 1.0f},
            {-640, -368, 640, 368},
            0
        },

        {
            // RESNET_THREE_CLASS
            4,
            {0.1, 0.1, 0.1},   //People, Motorbike, Car
            "data",
            "Layer7_cov",
            "Layer7_bbox",
            16,
            110 * 1024 * 1024,
            {0, 0, 0},
            {0.0039215697906911373, 0.0039215697906911373, 0.0039215697906911373},
            {-640, -368, 640, 368},
            1
        },
    };

it seems it’s data normalization. i understand.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.