GPU memory allocated image to GStreamer

cpg_To · September 18, 2018, 5:50pm

Hello,

I am processing an image using several nppi functions. At the end of the processing I pass the image to gstreamer.

I have it working, but only sending CPU allocated memory frames. Gstreamer is freeing the buffers by itself when it finishes with them. In order to make a CPU allocated frame.

I tried everything :

Copying the frames manually with cudamemcpy. Works but it is super slow.
unified memory (as slow as manual copy with cudamemcpy). Also works.
zero-copy. Returns error code 11 if I allocate the memory with malloc.

zero-copy seems to work allocating the memory with cudamallochost, but that requires cudafreehost to free the memory and I don’t have control over that (gstreamer frees the memory).

Any idea? Should zero-copy work using malloc?

cpg_To · September 18, 2018, 9:08pm

I answer to my own question. Zero-copy doesn’t work with malloc. Needs the cudahostalloc function.

To make that work we need to allocate the buffers for gstreamer with gst_buffer_new_wrapped_full. The last parameter gives you the option to have a callback and control how the memory is released.

The processing time went from 60000us to 40us.

itaidagan · December 9, 2019, 2:14pm

Hi, could you explain a bit more about the process you went through in order to accelerate your application? I don’t see the callback you are referring to, more details and code will be greatly appreciated.