Data processing and overlay with Jetson Utils


Our project involves overlaying a number of images and text on top of RAW frames captured from a camera connected over CSI. I managed to get a nice MVP with Python and Jetson Utils (absolute life-saver) but now, as we’re transitioning into implementing the final application, some questions appear:

What are the options for generating the graphics? The final application will need dynamic positioning of PNG images and dynamic text. What are my options in how to do this? I’m thinking generate all the overlay graphics separately and overlay as one image.

I’m hoping to start a discussion. This is mostly a learning opportunity for me (have been doing bare metal and traditional embedded until now) so all this is a bit new to me.

Best Regards,

Hi @alexx88, glad to hear that jetson-utils has been useful for you so far - as you can probably tell, it is a bit of a lower-level API for drawing/text/shapes/ect than a typical UI toolkit for the sake of performance. If the images you are overlaying change position dynamically, then yes I would keep those buffered in separate images and compost them into the final image on each frame.

My predominant use-case for using jetson-utils / CUDA for basic 2D rendering is to be able to still do simple HUD/UI/overlay on live video without the display attached. If your application has display, you can do more advanced graphics in OpenGL. And for user interactivity, I mostly make web UI’s these days for buttons, text input, ect (again this is without a display being physically attached to your Jetson)

1 Like

My application doesn’t have a display, just creates the overlaid frames and saves them to the SD card. The overlay is generated by combining a static image with dynamically positioned ones and text. The video will be output at 60FPS to SD and the refresh rate for the overlay needs to be ~25 Hz. I’m thinking of the following options:

  1. Create a separate Python process that uses jetson-utils to combine the images, add the text and then pass that using shared memory to the process writing to SD.
  2. Run PyQT to generate the overlay (would this be able to do it at 25FPS?) and then, similarly to the above, pass the image through shared memory to the main process.
  3. Other options that I might not be seeing right now?

While testing, I noticed one odd thing: rendering the frame to the defined video output needs at least 16ms (60FPS) per frame. Is that the physical limit of the Jetson Nano for encoding to H264 and writing it to file?

While testing, I noticed one odd thing: rendering the frame to the defined video output needs at least 16ms (60FPS) per frame. Is that the physical limit of the Jetson Nano for encoding to H264 and writing it to file?

I had a brain wave and added a cudaDeviceSynchronize() call after my overlays as I had a sneaky suspicion that those calls were non-blocking. They were! Now I can see that the final stage of encoding the frame and writing it to SD takes ~7-9ms. Is that ballpark what I should expect it to be?

@dusty_nv , searched the forums, but couldn’t find a clear answer, so hoping you can clarify this for me: Is it possible to run CUDA operations (overlay-ing mainly) at the same time as doing H264 HW encoding? Can this be done using Jetson Utils? I’m trying to determine the best way to parallelize my application.

Thank you!

@alexx88 you can be processing a different video frame with CUDA at the same time H264 HW encoding is occurring in the background, yes. I wouldn’t separate these into different processes because it’s already threaded underneath with C++ and in the kernel, and you will incur overhead from sharing video across processes at the Python level.

So essentially create a pipeline system in a single Python process, since the CUDA and underlying operations are multithreaded anyway. Is my understanding correct?

PS: When you say multithreaded in the kernel, are you referring to the Linux kernel or the the CUDA kernels?

Thank you!

Yes, many of the CUDA functions in jetson_utils are asynchronous until cudaDeviceSynchronize() or cudaStreamSynchronize() is called. The videoInput/videoOutput interfaces are also threaded underneath.

I had meant how the CUDA kernel launches are non-blocking.

1 Like

Thank you! All is clear now.

In regards to audio, I can see there’s nothing in Jetson Utils that deals with it, after looking at the code for videoOutput, I’d guess that my only course of action would be to extend videOutput to allow for audio options. This would include adding the gstEncoder options and so on. Does this sound like roughly the right approach? Are there any plans to have audio support in Jetson Utils?

Yes, if your audio stream is interleaved with the video stream, than that would probably be the right approach. I don’t currently have plans to add audio support into jetson-utils, although DeepStream does already.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.