Accelerated GStreamer text overlay

Hi,

Can you recommend any solution to display/burn in dynamic text overlay on vide using GStreamer?

We have a very simple pipeline:

nvarguscamerasrc sensor-id=<DCL_SENSOR_ID> sensor-mode=0 gainrange=“1 16” ispdigitalgainrange=“1 1”
! video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1
! nvvidconv ! textoverlay name=text_overlay ! video/x-raw,format=I420
! nvvidconv ! nvv4l2vp8enc bitrate=8000000 control-rate=1 ! rtpvp8pay mtu=1400 ! udpsink auto-multicast=true clients=<DCL_UDP_SINK_CLIENTS>

where we change the text 5 times in a second and this consumes ~50% of a CPU core on a Nano, but without the text overlay it is below 20%.

Do you have any idea how could we make it more efficient / add less overhead?

Thanks!

Bests,
Peter

Hi,
You may try this:
Tx2-4g r32.3.1 nvivafilter performance - #16 by DaneLLL

1 Like

Looks very promising! Thanks!

Im new to CUDA, is this ‘nvivafilter’ will execute the linked code as in case with simple CUDA kernels? Or is there any limitation I should take into account? Based on the example: nvsample_cudaprocess.cu it seems to be.

Also can you point me to a documentation / example where I can see how to pass data between the CPU and GPU? Our use case would be to have a simple C++ app which would receive the events and based on them it should be some updates on the video overlay. So is there any ‘gstreamer’-ish async event based communication or should I use cudaMallocManaged to allocate variables in the shared memory?

Thanks!

Bests,
Peter

Hi,

Using this approach we now implemented a much more performant solution. Thank you for the suggestion!

But as I’m pretty new to GPU programming the solution itself is a naive solution and Im confident there are lot of room for further improvements.

What Im not sure is how can I profile or debug the nvivafilter. I looked around and the as I see there are very nice tools for such task for standalone CUDA application, nvprof and cuda-memcheck is popping up most of the time. However can I use this with GStreamer and if yes how? Or is there any other alternative to gain performance insights from the nvivafilter?

Thanks!

I think that native profiling on Jetson is on stand-by for now.

You may use host side tools such as nsight for getting GPU stats. Someone else would better advise.

I see :( its a bummer it looked to be a very useful tool.

But my question would still apply for these Nsight tools as well. Can these tools be used to profile CUDA code when it is referenced from a GStreamer pipeline as a shared library?

You may create a new topic for this as it gets far from original title ;-)

For anyone getting here, here are some modified files from public_sources/nvsample_cuda_process:
nvsample_cudaprocess.cu (6.5 KB)
Makefile (5.0 KB)

This may still require some clean up work for avoiding some fault when closing…

gst-launch-1.0 nvarguscamerasrc num-buffers=300 ! nvivafilter customer-lib-name="libnvsample_cudaprocess.so" post-process=true ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvvidconv ! nvoverlaysink
1 Like

Hi!

Sorry for the late response!

Thank you for the code examples! In the meantime I created a solution for the text overlay based on this:

Not as flexible as the one you mentioned, but I guess it will cut it for now.

So my last remaining question would be how can I profile these applications.

But you are right it is getting diverged from the current topic :D so I created a new one:

Thanks for the help!