Force rendering on CPU on Jetson Xavier

Hi,

thanks for Jetson Xavier it’s a really great device.

I’m currently building an application that basically capture a video stream (using opencv2 from /dev/video0), detect objects using a custom model with Tensorflow and render the captured image in fullscreen with the bounding boxes everything at 30 FPS (using opencv2).

In order to satisfy the real-time requirements, my application has multiple threads: I’m predicting and rendering at the same time.

However, it looks like rendering and predicting at the same time create contention on the GPU and it slows down the both of them.

Questions:

  • Is it possible to force rendering on CPU to use the GPU for only predicting?
  • Why is there contention when rendering and predicting? Rendering alone takes up to 20% of the GPU, model takes up to 60% of GPU (measured using jtop) so both combined, I’m not using even 100%. If you can provide me with technical details on why it happens, it would be greatly appreciated.

Thanks in advance,

Hi,
We have implememtation in gstreamer and tegra_multimedia_api. You can check

https://developer.nvidia.com/embedded/dlc/l4t-multimedia-api-reference-32-1

In gstreamer, we have nv3dsink and nvdrmvideosink
Im tegra_multimedia_api, we have NvEglRenderer and NvDrmRenderer.

Below samples are to hook with OpenCV for your reference.
[url]https://devtalk.nvidia.com/default/topic/1022543/jetson-tx2/gstreamer-nvmm-lt-gt-opencv-gpumat/post/5311027/#5311027[/url]
[url]https://devtalk.nvidia.com/default/topic/1047563/jetson-tx2/libargus-eglstream-to-nvivafilter/post/5319890/#5319890[/url]

Hi DaneLLL,

thank you for your answer. However it’s not clear for me what would be the gain as, IIRC, both EGL and DRM use the GPU but based on my experimentations, when I render a picture in fullscreen using cv2.imshow() and I predict from my model at the same time, a concurrent GPU use results in contention that slows down the inference time. Do you confirm?

Hi,
Not sure how cv2.imgshow() is implemented. Other users may share their experience. You may also check and compare tegrastats to get further information.

Hi DaneLLL,

thanks for your answer.

I’ve done some experiments with tegrastats and here are the results, also, let me give more context:

  • Jetson Xavier with Ubuntu 18.04 installed via JetPack, I replaced Unity by XFCE for memory reasons.
  • From XFCE, I’m running a small application using OpenCV (installed from JetPack) to grab frames from camera, detect objects using Tensorflow and display frames with the predicted bounding boxes in full screen.
  • If I’m running these steps sequentially, the inference time is my model is 33 ms.
  • If I’m running in parallel (inferring and displaying frames at the same time), the inference time is 40 ms.
  • I ran my application without predicting (only grab & display frames in full screen) and using tegrastats, I clearly see that the GPU is used (~5-10%).
  • Same experience with a Qt application, GPU is used.
  • Looks like something in the stack is using hardware acceleration, maybe X?

Questions:

  1. Is it normal that using the GPU for rendering and inferring at the same time slows down the inference time? The model I use is only 2Gb in the 16Gb of RAM. Is it possible for the GPU to render and compute GPGPU in parallel or is there a context switching?
  2. What are the options to only use the GPU for inference while also displaying pictures on screen? Is it possible to disable hardware acceleration for displaying on Jetson so the render is done on CPU? Would it be too slow?
  3. You mentioned nv3dsink, nvdrmvideosink, nveglrenderer, nrdrmrenderer, how can they help?

Thanks for your help!

Hi,
We don’t have experience about XFCE and OpenCV. Other users may share their experience.

Supported renderers are listed at #2. You may check if you can integrate your implementation to use these renderers. If you have seen issues in using the renderers, please share a gstreamer pipeline or tegra_multimedia_api sample so that we can reproduce it.