How to capture image which can directly using by CUDA?

Up2U · August 23, 2022, 3:22am

Hello,

I was using OpenCV capture to get images from RTSP steam.
(using gstreamer rtspsrc and h265 decoder)

Before processing the data, I have to call cudaMemcpy to copy the data to device memory.

When resolution is 2K, this copy takes about 1ms.
When resolution is 4K, this copy takes about <4ms.
It is not very long.
But when resolution is 8K, this copy takes about 16ms, which is really a long time.

20220823122050

Since Jetson board has only one DRAM,
I wonder is there a way to capture images which can directly using by CUDA?
Or am I using OpenCV capture wrong?

Thanks

DaneLLL · August 23, 2022, 7:20am

Hi,
An optimal solution is to run gstreamer command and get the buffer in appsink like:

rtspsrc ! rtph265depay ! h265parse ! nvv4l2decoder ! appsink

And use NvBufSurface APIs to map the buffer to cv::gpuMat. Here is a sample for Jetpack 4
Nano not using GPU with gstreamer/python. Slow FPS, dropped frames - #8 by DaneLLL

The sample cannot be applied directly since we have deprecated NvBuffer APIs on Jetpack 5. Please also refer to this patch which uses NVBufSurface APIs:
How to create opencv gpumat from nvstream? - #18 by DaneLLL

Up2U · August 23, 2022, 7:34am

Thanks for the reply.

Actually I am just using the gstreamer command like this for OpenCV capture:
rtspsrc ! rtph265depay ! h265parse ! nvv4l2decoder ! nvvidconv ! video/x-raw,format=RGBA ! appsink
(with ! nvvidconv ! video/x-raw,format=RGBA, or it will be gray)

And then I call cudaMemcpy to get a CUDA pointer, and it takes 16ms when resolution 8K.

I will check and try the link you wrote.

Up2U · August 23, 2022, 7:44am

And one more thing, I also had tried to use jetson-utils to get images.
But it seems that JetPack 5.x has bug on NVMM buffer.

https://github.com/dusty-nv/jetson-utils/blob/master/codec/gstBufferManager.h#L41

I have tried to edit the code to force using NVMM, and did not work well.

Is there any way or any schedule to fix this NVMM problem?

Thanks.

Up2U · August 23, 2022, 9:57am

Hi,
I have read the link you written.
I think this is for writing a gstreamer plugin.
And I just want to get GPU memory pointer, which will be used for processing images in normal C++ program without cudaMemcpy from Host to Device.

If OpenCV capture can not do this, I can use other libraries.
(I thought jetson-utils should solve this problem, but as I said in last post, it seems that there is bug in Jetpack 5.x)

DaneLLL · August 23, 2022, 12:07pm

Hi,
For using OpenCV, your method is fine. There is no further way for improvement.

The issue in jetson-utils is under investigation.

Up2U · August 23, 2022, 12:41pm

Thanks for your reply.

Maybe jetson-utils is the best way to use image with device pointer directly.

dusty_nv · August 23, 2022, 4:04pm

@Up2U it had been since JetPack 5.0.0 Developer Preview since I had tried using NVMM with the V4L2 codecs, so you might want to try again on the latest JetPack 5.0.2 and comment out that #undef. I’m not sure if the API for using NVMM with nvv4l2decoder had changed or if it was actually a bug.

Up2U · August 24, 2022, 4:03am

Hi,
I have tried with JetPack 5.0.1, and it was not worked.
I have not updated to JetPack 5.0.2 yet.
I will try to do that.

Up2U · August 24, 2022, 9:40am

@dusty_nv Hi, I have updated to JetPack 5.0.2.
And it is the same as JetPack 5.0.1.
(comment out that #undef video-viewer went wrong, and with #undef image showed OK.)

dusty_nv · August 24, 2022, 2:50pm

OK thanks, I will make a note to look into it again from my end. It actually looks like it may be a different error related to your 8K 10-bit format, as I have not seen that particular error before regarding the formats.

Up2U · August 25, 2022, 1:14am

Thanks for your reply.

About the video format, there is one other thing.
At the starting point, frame rate was recongnized as 59.94, which actually is 29.97.

After some time, the fps in the title of video-view went to correct value.

20220825101344

Up2U · August 30, 2022, 9:12am

@dusty_nv And there is one more thing: For videoSource, the type of fps is int. It seems to be a mistake.

github.com

dusty-nv/jetson-utils/blob/master/video/videoSource.h#L258


      
          	inline uint32_t GetWidth() const				{ return mOptions.width; }
          
          
	/**
          	 * Return the height of the stream, in pixels.
          	 */
          	inline uint32_t GetHeight() const				{ return mOptions.height; }
          	
          	/**
          	 * Return the framerate, in Hz or FPS.
          	 */
          	inline uint32_t GetFrameRate() const			{ return mOptions.frameRate; }
          
          
	/**
          	 * Get timestamp of the last captured frame, in nanoseconds.
           	 */
          	uint64_t GetLastTimestamp() const { return mLastTimestamp; }
          
          
	/**
          	 * Get raw image format.
           	 */
          	inline imageFormat GetRawFormat() const { return mRawFormat; }

github.com

dusty-nv/jetson-utils/blob/master/video/videoOptions.h#L70


      
          	 * This option can be set from the command line using `--input-height=N`
          	 * for videoSource streams, or `--output-height=N` for videoOutput streams.
          	 */
          	uint32_t height;	
          
          
	/**
          	 * The framerate of the stream (the default is 30Hz).
          	 * This option can be set from the command line using `--input-rate=N` or `--output-rate=N`
          	 * for input and output streams, respectively. The `--framerate=N` option sets it for both.
          	 */
          	float frameRate;
          	
          	/**
          	 * The encoding bitrate for compressed streams (only applies to video codecs like H264/H265).
          	 * For videoOutput streams, this option can be set from the command line using `--bitrate=N`.
          	 * @note the default bitrate for encoding output streams is 4Mbps (target VBR).
          	 */
          	uint32_t bitRate;
          
          
	/**
          	 * The number of ring buffers used for threading.

dusty_nv · August 30, 2022, 12:51pm

Ahh okay thank you @Up2U for noticing that, I will make a note to look into changing that and identifying what other potential side-effects that may have for code that is already using the videoSource::GetFrameRate() function.

Up2U · September 2, 2022, 11:15am

@DaneLLL Hi, I found some questions after updating to 5.0.2.

The time of cudaMemcpy from Host to Device, which is needed after OpenCV capture, is obviously longer than 5.0.1.
For example, 8K image was about 16ms when using 5.0.1, but now is about 22ms.
4K image was about 4ms when using 5.0.1, but now is about 5ms.
And video-viewer in jetson-utils can not show 8K image now (capture a few frames then failed), meanwhile 5.0.1 was OK.

All the question are using the same camera.

Does anybody meet the same question after changing to 5.0.2?

dusty_nv · September 2, 2022, 3:46pm

Hi @Up2U, is this 8K video source from a file, or from an RTP/RTSP stream? If it’s a network stream, I’m inclined to think the connection timed out or something, since it was successfully capturing a bunch of frames before. It would be interesting to know if it happens with a video file read from disk too.

Up2U · September 3, 2022, 3:29am

@dusty_nv Hi.

I did the 8k test with the same camera: ZCAM E2F. The 8k video source is from an RTSP stream. And I had set the bitrate to 5M or 50M to do the test, which is below the limit value of decoder.

http://www.z-cam.com/e2-f8/

I have tried to retrieve this 8k stream on Windows PC (GeForce 1070) with VLC and OBS, the video went well. Using FFMPEG + nvcodec, it also went well.

And I have done 4k test with camera: ZCAM E2C.

http://www.z-cam.com/e2c/

4k went well.
Although memcpy was slower than 5.0.1, which I mentioned in the last post.

Up2U · September 6, 2022, 10:13am

@dusty_nv Hi, I have done some test on video-viewer with Nsight System.
And I found that status of plugin nvv4l2decoder is quite different:
When it is 4k, there is long time block ‘ppoll’ in every period.

But when it is 8k, instead of block ‘ppoll’, there is block ‘ioctl’.

I wonder why this difference occurs.

And we can see from the 8k screenshot, that when process one frame, the next frame is arrived.

Here is the Nsight system data:
video-viewer-4k.nsys-rep (3.3 MB)
video-viewer-8k-slow.nsys-rep (8.0 MB)

dusty_nv · September 6, 2022, 1:31pm

Hi @Up2U, sorry I don’t have much insight into the nvv4l2decoder element, just the jetson-inference part, so you may want to create a new topic about that. Do you notice similar behavior if you run a standalone GStreamer pipeline with gst-launch-1.0 or with DeepStream? DeepStream is more optimized for high-bandwidth applications than jetson-inference is.

Up2U · September 7, 2022, 3:12am

Thanks for your reply. @dusty_nv
I have also run gst-launch-1.0 both 4k & 8k. Both of them can not get a smooth video.
I can not understand the timeline of this program, it is more complicated.
nvv4l2decoder in gst-launch-1.0 is not similar with that in video-viewer.

Here are the data files:
gst-4k.nsys-rep (3.5 MB)
gst-8k.nsys-rep (3.7 MB)

I have not used DeepStream yet. I will try to have a look on that.

Topic		Replies	Views
GPU Acceleration Support for OpenCV Gstreamer Pipeline Jetson Xavier NX opencv , gstreamer	17	8147	October 18, 2021
Gradualy increased memory usage when use gstreamer + opencv Jetson Nano opencv , gstreamer	26	3938	October 18, 2021
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1254	January 2, 2024
Need advice: 4K video capture & writing performance with OpenCV Jetson Xavier NX opencv , gstreamer	6	2192	March 2, 2022
How to read video with gstream + opencv + cuda Jetson Xavier NX opencv , cuda , gstreamer , python	3	6197	October 18, 2021
OpenCV GStreamer Capture really slow Jetson Xavier NX opencv , gstreamer	4	5025	October 18, 2021
Reading CSI camera input directly to GPU memory Jetson Nano camera , gstreamer , jetson-nano	5	2760	December 4, 2022
Can I capture still image during streaming video in gstreamer? Jetson Nano gstreamer	22	10899	October 18, 2021
Best option(s) to decode camera mjpg frame in python Jetson Nano python	17	4138	October 18, 2021
Camera capture resolution and display resolution jet Jetson TX1 camera , opencv , gstreamer	9	1568	October 31, 2022

How to capture image which can directly using by CUDA?

Related topics