Cuda Error: Illegal Memory Access when writing CUDA code for AGX Orin

hunter.norman · August 10, 2023, 9:06pm

Hardware Platform Jetson Orin AGX 32GB DevKit
JetPack Version 5.1.1

I am trying to write a plugin for manipulating video in a gstreamer pipeline using CUDA on the AGX Orin 32 GB. I found how to get a device pointer to the EGLImage from the NvBufSurface object from the nvbufsurface.h header file in DeepStream 6.2 and this forum post. I tried the method described but I still get the error “CUDA error: an illegal memory access was encountered.”

I am using the gst-nvdsvideotemplate as the base plugin and I call my own function on the input surface object to transform it. Here is my function:

void transformSurface(NvBufSurface *surface) {
  CUresult status;
  CUeglFrame eglFrame;
  CUgraphicsResource pResource = NULL;
  EGLImageKHR eglimage_src = NULL;

  if(surface->memType == NVBUF_MEM_SURFACE_ARRAY) {

      if (NvBufSurfaceMapEglImage (surface, -1) != CUDA_SUCCESS) {
        printf("NvBufSurfaceMapEglImage failed\n");
      }

      eglimage_src = surface->surfaceList[0].mappedAddr.eglImage;
      
      status = cuGraphicsEGLRegisterImage(&pResource, eglimage_src, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
      if (status != CUDA_SUCCESS)
      {
          printf("cuGraphicsEGLRegisterImage failed: %d, cuda process stop\n", status);
          exit (-1);
      };

      status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
      if (status != CUDA_SUCCESS)
      {
          printf("cuGraphicsResourceGetMappedEglFrame failed\n");
      }

      status = cuCtxSynchronize();
      if (status != CUDA_SUCCESS) {
          printf("cuCtxSynchronize failed\n");
      }

      void* gpu_ptr;
      gpu_ptr = eglFrame.frame.pPitch[0];

      custom_kernel(gpu_ptr, eglFrame);

      // Unregister resource and unmap EGL image
      cuGraphicsUnregisterResource(pResource);
      NvBufSurfaceUnMapEglImage(surface, -1);
  }
}

This function calls this CUDA kernel which at this point is simply checking if it can modify the first byte in the buffer as a test.

__global__ void validate_memory_kernel(uint8_t *data)
{
    if (threadIdx.x == 0 && threadIdx.y == 0 && blockIdx.x == 0 && blockIdx.y == 0)
    {
        data[0] = 255;
    }
}

bool custom_kernel(void* gpu_ptr, CUeglFrame eglFrame) {

    dim3 grid(1, 1);      // Single block in the grid
    dim3 block(1, 1);     // Single thread in the block

    validate_memory_kernel<<<grid, block>>>((uint8_t *)gpu_ptr);

    cudaError_t err = cudaGetLastError();
    if (err != cudaSuccess) {
        printf("Error: %s\n", cudaGetErrorString(err));
    }

    CHECK_CUDA(cudaDeviceSynchronize());

    return true;
}

In case it matters, this is the pipeline I am testing with:

GST_PLUGIN_PATH=plugin gst-launch-1.0 \
filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ! decodebin ! nvvideoconvert ! \
'video/x-raw(memory:NVMM), format=RGBA, width=1280, height=720' ! \
nvdsvideotemplate customlib-name=lib/libnvds_videotemplate_impl.so ! \
identity silent=1 ! nvegltransform ! nveglglessink

I also printed out the information about the NvBufSurface that my function is operating on:

NvBufSurface Information:
GPU ID: 0
Batch Size: 1
Number of Filled Buffers: 1
Is Contiguous: No
Memory Type: NVBUF_MEM_SURFACE_ARRAY
Buffer #0:
         Width: 1280, Height: 720
         Pitch: 5120
         Color Format: RGBA
         Layout: NVBUF_LAYOUT_PITCH
         DMABUF FD: 132
         Data Size: 3801088
        Plane #0:
                 width: 1280
                 height: 720
                 pitch: 5120
                 offset: 0
                 plane size: 3801088
                 bytes per pixel: 4
                 scan format: progressive
         Start of Valid Data: 0
         Chroma Subsampling Location Horizontal: 0
         Chroma Subsampling Location Vertical: 0
         Is Protected: No

Is there some important step that I am missing for how to access the memory from CUDA? I think my error has something to do with improperly passing host/device pointers, but the unified memory on the Orin makes this confusing and as far as I can tell I am following the same steps that I found in the DeepStream headers.

AastaLLL · August 11, 2023, 3:32am

Hi,

A common cause of illegal memory is GPU access the some CPU-only buffer.

Your implementation looks correct to me.
Would you mind sharing a complete runnable source so we can check it deeper?

Thanks.

hunter.norman · August 11, 2023, 1:15pm

Thank you for verifying it was correct. The error was a small error elsewhere in the code unrelated to the CUDA kernel. The plugin works now.

system · August 28, 2023, 6:27am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.