Design and architecture guidance

cloud9ine · January 15, 2021, 1:11am

Hi, I am looking to do the following. I want to have the Jetson Nano accept a high resolution camera video input from the CSI input, crop a specific region of it (conforming to a specific aspect ratio), scale it to a constant resolution (conforming to the same aspect ratio), and output it, say, to a display sink.

I know how to accomplish this using a gstreamer pipeline statically.

The part that I want to change is that I want to sample and inspect some of the frames at a set rate (for example every 30th frame) using jetson-inference and adjust the settings of the crop element based on the inferences made by the inference.

What would be the best way to architect it? Would it be best to use a gstreamer pipeline to stream the source to a display sink, but then also tee the video before the crop element, and then use an appsink to feed it to the analysis portion. Or is there another structure that might work better.

DaneLLL · January 15, 2021, 5:08am

Hi,
In jetson-inference, there is gl-display. Probably it is better to use the implementation directly.

cloud9ine · January 15, 2021, 3:26pm

I am not clear what you mean. How can I use gldisplay in this context to display a fixed size window into which a specific subportion of the input video is scaled and shown in addition to the coordinates of the subportion and the associated scaling parameters being dynamically modified.

cloud9ine · January 15, 2021, 3:28pm

looks like I should use gldisplay render instead of renderonce for video and setviewport to set the subarea of the video to show? I noticed that jetson inference tools modify the video in place to insert the detections. What if I want to use the detections to crop the video but I don’t want any bounding boxes etc. added to the video stream?

cloud9ine · January 15, 2021, 3:52pm

Where can I find documentation on gldisplay?

cloud9ine · January 17, 2021, 6:21am

HI @DaneLLL or @kayccc could you please help provide some guidance on the best way to go about implementing this in terms of architecture? I want to add something I forgot to mention before. Here is what I am looking to do;

Camera image → undistort → image sampling point for inference → crop → scale → output (to display or file)

in parallel, I want to sample the undistorted image prior to crop every N frames, analyze the image (run a detector on it) and use the results from the analysis of this frame and potentially N previous analyzed frames to determine parameters for the crop element. I am hoping to keep my video 30 fps.

My questions are:

What is the best way to undistort the image most efficiently? If I use opencv procedures to determine a camera undistortion matrix, what is the best way to utilize that on the Jetson? It has to be efficient because I have to undistort every image. Is the opencv undistort element in gstreamer the way to go? Or is there a better option?
What is the best way to sample images after undistort but before crop with a sampling rate? i.e. perhaps sample one out of every 30 frames for analysis?
How can I set the crop parameters based on the analysis?

Thanks for any help you are able to provide. If you can provide me a high level idea and point me to any documentation. I can dive deep and learn more about this and figure it out.

cloud9ine · January 17, 2021, 6:26am

Also, @kayccc I don’t think the tags are right. I am not asking about board design. I am asking about software design to accomplish a specific task using the Jetson Nano dev kit.

cloud9ine · January 18, 2021, 3:17am

I did some research today - let me know if I am on the right track. My two options seem to be:

Write it as a gstreamer cpp application with two pipelines, and an appsink and an appsrc. The code that handles data delivery to the appsink will use a counter to run inference on every N frames. It would then send a cropped version of the image to the appsrc and this pipeline would scale it as needed. For this, I am missing any kind of guidance on writing a gstreamer application on the nano. Should I just follow the regular gstreamer application development guide?
Write it using videosource and gldisplay. But I am hitting a wall of lack of documentation. For instance, I can’t even seem to figure out how to specify the input width or height or rotation to videosource or output window width or height to gldisplay. Or how to use detectNet without it overlaying a bounding box for the detections on the image. From looking at the files, it looks like it may be a true/false parameter for the last argument? I have looked through the getting started guide - it doesn’t seem to get me far enough along. Is there another place I should be looking?

cloud9ine · January 18, 2021, 4:48pm

Another potential method I thought of is to write my application as a GST plugin. Would it suit my needs better?

dusty_nv · January 18, 2021, 9:39pm

Hi @cloud9ine, DeepStream SDK has custom GStreamer plugins that you could leverage in writing your own application. If you were to write your own plugin, for performance it should use CUDA or libraries that use CUDA for image processing.

Regarding your questions about jetson-inference, you can find the documentation here:

GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
jetson-inference/aux-image.md at master · dusty-nv/jetson-inference · GitHub

The bottom link has examples of cropping via C++/Python that use CUDA underneath. The C++ implementation of glDisplay has more functions than the Python bindings support. In general, the image processing is done with CUDA functions, then the final image is passed to glDisplay for rendering. It also supports other outputs, like saving to compressed video or streaming via RTP.

To have detectNet not overlay bounding boxes on the video, start the detectnet/detectnet.py app with --overlay=none. This passes the overlay flag to detectNet.Detect()

You can use detectNet without glDisplay - in C++ the image data is available via the data pointer, or in Python you can use these image accessors or numpy arrays: https://github.com/dusty-nv/jetson-inference/blob/master/docs/aux-image.md#accessing-image-data-in-python

cloud9ine · January 19, 2021, 6:33am

Hi @dusty_nv

Thank you so much for responding. I reviewed the api docs you posted, the deepstream SDK documentation, as well as the info on how to access image data in python.

The detectnet api info is clear. I can call detectnet.detect with image, width, height, and none for the last parameter to avoid overlay on the image. I can then use the detections to crop the image and use gldisplay to render it. So far so good.

The jetson utils api documentation is missing a lot of the info I am looking for (or maybe I am not looking at it correctly). For instance, how do I initialize gstCamera or videoSource to use CSI MIPI camera with a resolution of 4032 x 3040, a frame rate of 30 fps, and a flip-method of 1, for instance? Similarly, what if I crop the image using the image data manipulations you linked and scale the image to a 720 x 1280 image that I want to show using a glDisplay window of 1280x720 by flipping the image clockwise again?

I tried passing these arguments while initializing gstCamera and videoSource as a second argument string, as part of the input URI string, and other ways, and I simply cannot get it to work. For example,

camera = jetson.utils.videoSource("csi://0 --input-width=4032 --input-height=3040 --flip-method=1")

or

camera = jetson.utils.videoSource("csi://0", "--input-width=4032 --input-height=3040 --flip-method=1")

Similarly for glDisplay as well.

None of it seems to appropriately affect the gstreamer pipeline that gets used in the background for input stream or properties of the output window. If I am just being completely obtuse, please let me know. I feel that jetson-inference and jetson-utils has almost everything I need if I could figure out how to control these parameters. DeepStream might be a bit too deep for me and I don’t need to run inference on every frame, so I am hoping to use a counter and do it selectively, so jetson-utils and jetson-inference looks like the way to go if I could figure out these details.

cloud9ine · January 21, 2021, 8:45pm

Sorry to bump this again.

I have made progress on turning off overlay and processing the detections.

Looks like the way to control videoSource is to use a video options struct. I’m hitting a dead end though on how to set up and pass this structure from python when calling videoSource. Could you please point me to an example?

dusty_nv · January 21, 2021, 9:46pm

Unfortunately there aren’t Python bindings for this struct - instead, you can pass a list of options (in command-line format) to the argv keyword argument of videoSource. For example:

input = jetson.utils.videoSource("/dev/video0", argv=["--input-width=640", "--input-height=480"])

Here are the arguments available:

jetson-inference/docs/aux-streaming.md at master · dusty-nv/jetson-inference · GitHub

cloud9ine · January 22, 2021, 6:39pm

I’ve made significant progress on this. I’m able to do all the required operations in cuda except two operations which are causing my fps to drop to 12-13.

A 90 degree flip. Is there any way to do it in cuda? I see the image manipulation functions include crop and resize but not any kind of flip. I don’t see a flip argument for videoOutput either.
Padding. Is there a way to pad an image with single colour border?

dusty_nv · January 22, 2021, 7:48pm

Hi @cloud9ine, I don’t have these CUDA functions in my library currently, but they should be fairly simple should you wish to implement them. You can find some examples under jetson-inference/utils/cuda. In particular, these files have simpler functions which you could base your own kernels off of:

cloud9ine · January 22, 2021, 10:37pm

Thanks, @dusty_nv. I was able to implement my own cudaFlip function, add a python binding, and get it all to work. That part of the code is all good and fast but now glDisplay is taking 0.05 seconds to render each 1280x540 image. That puts my maximum achievable frame rate at 20 fps and because of the other code I’m running, I’m ending up at about 15fps.

Is there any way to speed up the rendering on glDisplay?

cloud9ine · January 22, 2021, 10:45pm

I should have said videoOutput in the comment above. I believe it’s using glDisplay in the background.

cloud9ine · January 22, 2021, 11:22pm

I used time() to do a rough profile of parts of my code. I tried two variations:

Complete image processing using CUDA, transfer to open cv for display. This is what a typical loop looks like ( I am only doing inference on one out of every 20 frames so the first portion would be a bit longer on that one frame)

It took 0.000151872634888 seconds for everything until image manipulation using cuda.
It took 5.72204589844e-05 seconds for image manipulation using cuda.
It took 0.0481369495392 seconds to transfer image to opencv.
It took 0.0021071434021 seconds to render the image.

If I render using videoOutput, it becomes:

It took 0.000125885009766 seconds for everything until image manipulation using cuda.
It took 5.91278076172e-05 seconds for image manipulation using cuda.
It took 0.0488979816437 seconds to render the image.

So, opencv seems to be able to render faster but transferring from CUDA to OpenCV (CUDA to Numpy Array plus colorspace conversion) seems to take about as long as glDisplay takes to render the image. Is there any way I can bring down the rendering time by half?

dusty_nv · January 23, 2021, 2:06am

@cloud9ine without doing further profiling inside glDisplay/glTexture, it would be hard to determine if it could be made faster. For display://* outputs, videoSource uses glDisplay (glDisplay is an instance of videoSource). When rendering textures, it uses CUDA<->OpenGL interoperability.

Can you tell if the rendering time is constant or does it vary with the resolution of the image?

cloud9ine · January 23, 2021, 7:53pm

Yes, I just checked. It’s taking twice as long to render a 1920x1080 image as it does to render a 1280x720 image.

Is there any alternative? For instance, if I build opencv with cuda support, would we be able to eliminate or speed up the data transfer from cuda to opencv?

Topic		Replies	Views
Exception: jetson.utils -- failed to create glDisplay device Jetson Nano opencv	15	3393	June 18, 2020
record and run inference at the same time, split video Jetson Nano	25	3341	May 20, 2020
Use the Object Detection on the Jetson Nano and put out the results per Usb Jetson Nano jetson-inference	11	2266	September 3, 2020
cudaToNumpy -> cv2.imshow not responding, no video output, no Error - csi camera Jetson Nano camera , opencv , cuda , jetson-inference	12	8309	July 28, 2020
10 lines of code example out of date? Jetson Nano	6	1232	September 11, 2020
Live Camera Recognition Demo headless Problems Jetson Nano camera , gstreamer , nano2gb	5	1504	November 9, 2020
How to access camera with Jetson Nano Jetson Nano camera	24	1383	July 2, 2024
Hello AI World - now supports Python and onboard training with PyTorch! Jetson TX2	12	1999	March 27, 2020
Jetson NANO and USB 5.8G UVC Camera Receiver Jetson Nano	11	1771	June 26, 2020
NV Multimedia API with OpenCV Jetson Nano camera , ros , opencv , mmapi	12	3829	January 17, 2021

Design and architecture guidance

Related topics