GPU memory allocated image to GStreamer

Hello,

I am processing an image using several nppi functions. At the end of the processing I pass the image to gstreamer.

I have it working, but only sending CPU allocated memory frames. Gstreamer is freeing the buffers by itself when it finishes with them. In order to make a CPU allocated frame.

I tried everything :

  1. Copying the frames manually with cudamemcpy. Works but it is super slow.
  2. unified memory (as slow as manual copy with cudamemcpy). Also works.
  3. zero-copy. Returns error code 11 if I allocate the memory with malloc.

zero-copy seems to work allocating the memory with cudamallochost, but that requires cudafreehost to free the memory and I don’t have control over that (gstreamer frees the memory).

Any idea? Should zero-copy work using malloc?

I answer to my own question. Zero-copy doesn’t work with malloc. Needs the cudahostalloc function.

To make that work we need to allocate the buffers for gstreamer with gst_buffer_new_wrapped_full. The last parameter gives you the option to have a callback and control how the memory is released.

The processing time went from 60000us to 40us.

Hi, could you explain a bit more about the process you went through in order to accelerate your application? I don’t see the callback you are referring to, more details and code will be greatly appreciated.