Accessing mapped address in a CUDA Kernel

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• T4
• DeepStream Version
• 5.0
• TensorRT Version
• 7.0
• NVIDIA GPU Driver Version (valid for GPU only)
• 450.51.05

Hello all,

I’m attempting to create a CUDA kernel function which does image flipping. This is a trivial problem but this is more for testing the customization of Deepstream and the plausibility of doing custom transformations on frame data. The issue I’m running into is that when I try to access the frame data located here:


and pass this data into a CUDA kernel, I get the error:

Cuda failure: status=700

The function I’m using which creates the error is:

__global__ void flip(uchar *in_mat, uchar *out_mat, int rows, int cols, int pitch)
	int x = blockIdx.x * blockDim.x + threadIdx.x;
	int y = blockIdx.y * blockDim.y + threadIdx.y;

	if ((y < rows) && (x < cols))
		// T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;
		uchar* out_row = (uchar*)((char*)out_mat + y * pitch);
		uchar* in_row = (uchar*)((char*)in_mat + y * pitch);
		out_row[x] = in_row[cols - 1 - x];
		out_row[cols - 1 - x] = in_row[x];

The out_mat is allocated using:

cudaMallocPitch(&d_out_mat, &pitch, cols * elemSize, rows);

elemSize is 24 bytes since this is a 4-channeled image.
Whether or not the implementation of the flip function is correct. I’ve even tried just accessing the first element in the array and still get a 700 error. I should also mention that I’m not mapping the data to CPU or Syncing data to the CPU.
I’m assuming that the mappedAddr is already mapped on the GPU or this assumption incorrect?
Is there something I’m missing or is this customization capability of accessing the raw frame data on the GPU not allowed in Deepstream? Another question is can I set the mappedAddr.addr pointer to this newly created device array?


mappedAddr looks like a CPU buffer pointer.
You can check this sample for information:

in_mat =
  cv::Mat (surface->surfaceList[frame_meta->batch_id].planeParams.height[0],
  surface->surfaceList[frame_meta->batch_id].planeParams.width[0], CV_8UC4,

Have you tried dataPtr?


Hi @AastaLLL,

Thank you for your speedy reply! Using the dataPtr did the trick but now when I try to modify the pointer the data doesn’t actually get modified when I attempt to save the image frames. I’m aware of that these functions NvBufSurfaceSyncForCpu and NvBufSurfaceSyncForDevice don’t actually work for dGPU. The only thing that does work is creating a destination array then doing a cudaMemcpy2D into the original data pointer array which seems like wasteful cycles. Is this the only solution for modifying the original pointer or there some other method for modifying the original address space which the frame occupies?


May I know why do you need to modify the buffer pointer?
Usually, we don’t change the buffer but reflash the value of a pre-allocated buffer.


@AastaLLL don’t think there is a specific reason for modifying the buffer pointer but merely trying to store the changed frame back into the same NvBufSurface. Are you saying to create a new dataPtr and re-assign that dataPtr back onto the NvSurfaceParams (freeing the old pointer)?


The buffer pointer is mapped to enable the CPU and GPU access.
It won’t work if you only update the pointer. (actually, it should be handled as read-only)

You may need to re-map the buffer from nvbuf to change the buffer location.

@AastaLLL, do you mean explicitly unmapping and re-mapping the NvBuf surface? I don’t think that solves the problem of modifying the underlying data buffer. The only way I can see is to either directly modify the dataPtr in NvBufSurfaceParams or create an entirely new NvBufSurface and adding that to the GstBuffer. However, I know that this won’t work since the GstBuffer and NvBufSurface are privately hooked together (I saw this in a forum post here but cannot find that post anymore).


Sorry for my unclear statement before.
It looks like that you cannot replace the buffer location among the pipeline.

If you want to store the output frame back to the NvBufSurface.
It’s worthy to try to re-flash the value instead of replace the buffer pointer.