Cuda Error: Illegal Memory Access when writing CUDA code for AGX Orin

Hardware Platform Jetson Orin AGX 32GB DevKit
JetPack Version 5.1.1

I am trying to write a plugin for manipulating video in a gstreamer pipeline using CUDA on the AGX Orin 32 GB. I found how to get a device pointer to the EGLImage from the NvBufSurface object from the nvbufsurface.h header file in DeepStream 6.2 and this forum post. I tried the method described but I still get the error “CUDA error: an illegal memory access was encountered.”

I am using the gst-nvdsvideotemplate as the base plugin and I call my own function on the input surface object to transform it. Here is my function:

void transformSurface(NvBufSurface *surface) {
  CUresult status;
  CUeglFrame eglFrame;
  CUgraphicsResource pResource = NULL;
  EGLImageKHR eglimage_src = NULL;

  if(surface->memType == NVBUF_MEM_SURFACE_ARRAY) {

      if (NvBufSurfaceMapEglImage (surface, -1) != CUDA_SUCCESS) {
        printf("NvBufSurfaceMapEglImage failed\n");
      }

      eglimage_src = surface->surfaceList[0].mappedAddr.eglImage;
      
      status = cuGraphicsEGLRegisterImage(&pResource, eglimage_src, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
      if (status != CUDA_SUCCESS)
      {
          printf("cuGraphicsEGLRegisterImage failed: %d, cuda process stop\n", status);
          exit (-1);
      };

      status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
      if (status != CUDA_SUCCESS)
      {
          printf("cuGraphicsResourceGetMappedEglFrame failed\n");
      }

      status = cuCtxSynchronize();
      if (status != CUDA_SUCCESS) {
          printf("cuCtxSynchronize failed\n");
      }

      void* gpu_ptr;
      gpu_ptr = eglFrame.frame.pPitch[0];

      custom_kernel(gpu_ptr, eglFrame);

      // Unregister resource and unmap EGL image
      cuGraphicsUnregisterResource(pResource);
      NvBufSurfaceUnMapEglImage(surface, -1);
  }
}

This function calls this CUDA kernel which at this point is simply checking if it can modify the first byte in the buffer as a test.

__global__ void validate_memory_kernel(uint8_t *data)
{
    if (threadIdx.x == 0 && threadIdx.y == 0 && blockIdx.x == 0 && blockIdx.y == 0)
    {
        data[0] = 255;
    }
}

bool custom_kernel(void* gpu_ptr, CUeglFrame eglFrame) {

    dim3 grid(1, 1);      // Single block in the grid
    dim3 block(1, 1);     // Single thread in the block

    validate_memory_kernel<<<grid, block>>>((uint8_t *)gpu_ptr);

    cudaError_t err = cudaGetLastError();
    if (err != cudaSuccess) {
        printf("Error: %s\n", cudaGetErrorString(err));
    }

    CHECK_CUDA(cudaDeviceSynchronize());

    return true;
}

In case it matters, this is the pipeline I am testing with:

GST_PLUGIN_PATH=plugin gst-launch-1.0 \
filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ! decodebin ! nvvideoconvert ! \
'video/x-raw(memory:NVMM), format=RGBA, width=1280, height=720' ! \
nvdsvideotemplate customlib-name=lib/libnvds_videotemplate_impl.so ! \
identity silent=1 ! nvegltransform ! nveglglessink

I also printed out the information about the NvBufSurface that my function is operating on:

NvBufSurface Information:
GPU ID: 0
Batch Size: 1
Number of Filled Buffers: 1
Is Contiguous: No
Memory Type: NVBUF_MEM_SURFACE_ARRAY
Buffer #0:
         Width: 1280, Height: 720
         Pitch: 5120
         Color Format: RGBA
         Layout: NVBUF_LAYOUT_PITCH
         DMABUF FD: 132
         Data Size: 3801088
        Plane #0:
                 width: 1280
                 height: 720
                 pitch: 5120
                 offset: 0
                 plane size: 3801088
                 bytes per pixel: 4
                 scan format: progressive
         Start of Valid Data: 0
         Chroma Subsampling Location Horizontal: 0
         Chroma Subsampling Location Vertical: 0
         Is Protected: No

Is there some important step that I am missing for how to access the memory from CUDA? I think my error has something to do with improperly passing host/device pointers, but the unified memory on the Orin makes this confusing and as far as I can tell I am following the same steps that I found in the DeepStream headers.

Hi,

A common cause of illegal memory is GPU access the some CPU-only buffer.

Your implementation looks correct to me.
Would you mind sharing a complete runnable source so we can check it deeper?

Thanks.

Thank you for verifying it was correct. The error was a small error elsewhere in the code unrelated to the CUDA kernel. The plugin works now.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.