Hardware Platform Jetson Orin AGX 32GB DevKit
JetPack Version 5.1.1
I am trying to write a plugin for manipulating video in a gstreamer pipeline using CUDA on the AGX Orin 32 GB. I found how to get a device pointer to the EGLImage from the NvBufSurface object from the nvbufsurface.h header file in DeepStream 6.2 and this forum post. I tried the method described but I still get the error “CUDA error: an illegal memory access was encountered.”
I am using the gst-nvdsvideotemplate as the base plugin and I call my own function on the input surface object to transform it. Here is my function:
void transformSurface(NvBufSurface *surface) {
CUresult status;
CUeglFrame eglFrame;
CUgraphicsResource pResource = NULL;
EGLImageKHR eglimage_src = NULL;
if(surface->memType == NVBUF_MEM_SURFACE_ARRAY) {
if (NvBufSurfaceMapEglImage (surface, -1) != CUDA_SUCCESS) {
printf("NvBufSurfaceMapEglImage failed\n");
}
eglimage_src = surface->surfaceList[0].mappedAddr.eglImage;
status = cuGraphicsEGLRegisterImage(&pResource, eglimage_src, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
if (status != CUDA_SUCCESS)
{
printf("cuGraphicsEGLRegisterImage failed: %d, cuda process stop\n", status);
exit (-1);
};
status = cuGraphicsResourceGetMappedEglFrame(&eglFrame, pResource, 0, 0);
if (status != CUDA_SUCCESS)
{
printf("cuGraphicsResourceGetMappedEglFrame failed\n");
}
status = cuCtxSynchronize();
if (status != CUDA_SUCCESS) {
printf("cuCtxSynchronize failed\n");
}
void* gpu_ptr;
gpu_ptr = eglFrame.frame.pPitch[0];
custom_kernel(gpu_ptr, eglFrame);
// Unregister resource and unmap EGL image
cuGraphicsUnregisterResource(pResource);
NvBufSurfaceUnMapEglImage(surface, -1);
}
}
This function calls this CUDA kernel which at this point is simply checking if it can modify the first byte in the buffer as a test.
__global__ void validate_memory_kernel(uint8_t *data)
{
if (threadIdx.x == 0 && threadIdx.y == 0 && blockIdx.x == 0 && blockIdx.y == 0)
{
data[0] = 255;
}
}
bool custom_kernel(void* gpu_ptr, CUeglFrame eglFrame) {
dim3 grid(1, 1); // Single block in the grid
dim3 block(1, 1); // Single thread in the block
validate_memory_kernel<<<grid, block>>>((uint8_t *)gpu_ptr);
cudaError_t err = cudaGetLastError();
if (err != cudaSuccess) {
printf("Error: %s\n", cudaGetErrorString(err));
}
CHECK_CUDA(cudaDeviceSynchronize());
return true;
}
In case it matters, this is the pipeline I am testing with:
GST_PLUGIN_PATH=plugin gst-launch-1.0 \
filesrc location=/opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.mp4 ! decodebin ! nvvideoconvert ! \
'video/x-raw(memory:NVMM), format=RGBA, width=1280, height=720' ! \
nvdsvideotemplate customlib-name=lib/libnvds_videotemplate_impl.so ! \
identity silent=1 ! nvegltransform ! nveglglessink
I also printed out the information about the NvBufSurface that my function is operating on:
NvBufSurface Information:
GPU ID: 0
Batch Size: 1
Number of Filled Buffers: 1
Is Contiguous: No
Memory Type: NVBUF_MEM_SURFACE_ARRAY
Buffer #0:
Width: 1280, Height: 720
Pitch: 5120
Color Format: RGBA
Layout: NVBUF_LAYOUT_PITCH
DMABUF FD: 132
Data Size: 3801088
Plane #0:
width: 1280
height: 720
pitch: 5120
offset: 0
plane size: 3801088
bytes per pixel: 4
scan format: progressive
Start of Valid Data: 0
Chroma Subsampling Location Horizontal: 0
Chroma Subsampling Location Vertical: 0
Is Protected: No
Is there some important step that I am missing for how to access the memory from CUDA? I think my error has something to do with improperly passing host/device pointers, but the unified memory on the Orin makes this confusing and as far as I can tell I am following the same steps that I found in the DeepStream headers.