How to capture image which can directly using by CUDA?

Hello,

I was using OpenCV capture to get images from RTSP steam.
(using gstreamer rtspsrc and h265 decoder)

Before processing the data, I have to call cudaMemcpy to copy the data to device memory.

When resolution is 2K, this copy takes about 1ms.
When resolution is 4K, this copy takes about <4ms.
It is not very long.
But when resolution is 8K, this copy takes about 16ms, which is really a long time.

20220823122050

Since Jetson board has only one DRAM,
I wonder is there a way to capture images which can directly using by CUDA?
Or am I using OpenCV capture wrong?

Thanks

Hi,
An optimal solution is to run gstreamer command and get the buffer in appsink like:

rtspsrc ! rtph265depay ! h265parse ! nvv4l2decoder ! appsink

And use NvBufSurface APIs to map the buffer to cv::gpuMat. Here is a sample for Jetpack 4
Nano not using GPU with gstreamer/python. Slow FPS, dropped frames - #8 by DaneLLL

The sample cannot be applied directly since we have deprecated NvBuffer APIs on Jetpack 5. Please also refer to this patch which uses NVBufSurface APIs:
How to create opencv gpumat from nvstream? - #18 by DaneLLL

Thanks for the reply.

Actually I am just using the gstreamer command like this for OpenCV capture:
rtspsrc ! rtph265depay ! h265parse ! nvv4l2decoder ! nvvidconv ! video/x-raw,format=RGBA ! appsink
(with ! nvvidconv ! video/x-raw,format=RGBA, or it will be gray)

And then I call cudaMemcpy to get a CUDA pointer, and it takes 16ms when resolution 8K.

I will check and try the link you wrote.

And one more thing, I also had tried to use jetson-utils to get images.
But it seems that JetPack 5.x has bug on NVMM buffer.

https://github.com/dusty-nv/jetson-utils/blob/master/codec/gstBufferManager.h#L41

I have tried to edit the code to force using NVMM, and did not work well.

Is there any way or any schedule to fix this NVMM problem?

Thanks.

Hi,
I have read the link you written.
I think this is for writing a gstreamer plugin.
And I just want to get GPU memory pointer, which will be used for processing images in normal C++ program without cudaMemcpy from Host to Device.

If OpenCV capture can not do this, I can use other libraries.
(I thought jetson-utils should solve this problem, but as I said in last post, it seems that there is bug in Jetpack 5.x)

Hi,
For using OpenCV, your method is fine. There is no further way for improvement.

The issue in jetson-utils is under investigation.

Thanks for your reply.

Maybe jetson-utils is the best way to use image with device pointer directly.

@Up2U it had been since JetPack 5.0.0 Developer Preview since I had tried using NVMM with the V4L2 codecs, so you might want to try again on the latest JetPack 5.0.2 and comment out that #undef. I’m not sure if the API for using NVMM with nvv4l2decoder had changed or if it was actually a bug.

Hi,
I have tried with JetPack 5.0.1, and it was not worked.
I have not updated to JetPack 5.0.2 yet.
I will try to do that.

@dusty_nv Hi, I have updated to JetPack 5.0.2.
And it is the same as JetPack 5.0.1.
(comment out that #undef video-viewer went wrong, and with #undef image showed OK.)

OK thanks, I will make a note to look into it again from my end. It actually looks like it may be a different error related to your 8K 10-bit format, as I have not seen that particular error before regarding the formats.

Thanks for your reply.

About the video format, there is one other thing.
At the starting point, frame rate was recongnized as 59.94, which actually is 29.97.

After some time, the fps in the title of video-view went to correct value.

20220825101344

@dusty_nv And there is one more thing: For videoSource, the type of fps is int. It seems to be a mistake.

Ahh okay thank you @Up2U for noticing that, I will make a note to look into changing that and identifying what other potential side-effects that may have for code that is already using the videoSource::GetFrameRate() function.

@DaneLLL Hi, I found some questions after updating to 5.0.2.

  1. The time of cudaMemcpy from Host to Device, which is needed after OpenCV capture, is obviously longer than 5.0.1.
    For example, 8K image was about 16ms when using 5.0.1, but now is about 22ms.
    4K image was about 4ms when using 5.0.1, but now is about 5ms.
  2. And video-viewer in jetson-utils can not show 8K image now (capture a few frames then failed), meanwhile 5.0.1 was OK.

All the question are using the same camera.

Does anybody meet the same question after changing to 5.0.2?

Hi @Up2U, is this 8K video source from a file, or from an RTP/RTSP stream? If it’s a network stream, I’m inclined to think the connection timed out or something, since it was successfully capturing a bunch of frames before. It would be interesting to know if it happens with a video file read from disk too.

@dusty_nv Hi.

  1. I did the 8k test with the same camera: ZCAM E2F. The 8k video source is from an RTSP stream. And I had set the bitrate to 5M or 50M to do the test, which is below the limit value of decoder.

http://www.z-cam.com/e2-f8/

I have tried to retrieve this 8k stream on Windows PC (GeForce 1070) with VLC and OBS, the video went well. Using FFMPEG + nvcodec, it also went well.

  1. And I have done 4k test with camera: ZCAM E2C.

http://www.z-cam.com/e2c/

4k went well.
Although memcpy was slower than 5.0.1, which I mentioned in the last post.

@dusty_nv Hi, I have done some test on video-viewer with Nsight System.
And I found that status of plugin nvv4l2decoder is quite different:
When it is 4k, there is long time block ‘ppoll’ in every period.

But when it is 8k, instead of block ‘ppoll’, there is block ‘ioctl’.

I wonder why this difference occurs.

And we can see from the 8k screenshot, that when process one frame, the next frame is arrived.

Here is the Nsight system data:
video-viewer-4k.nsys-rep (3.3 MB)
video-viewer-8k-slow.nsys-rep (8.0 MB)

Hi @Up2U, sorry I don’t have much insight into the nvv4l2decoder element, just the jetson-inference part, so you may want to create a new topic about that. Do you notice similar behavior if you run a standalone GStreamer pipeline with gst-launch-1.0 or with DeepStream? DeepStream is more optimized for high-bandwidth applications than jetson-inference is.

Thanks for your reply. @dusty_nv
I have also run gst-launch-1.0 both 4k & 8k. Both of them can not get a smooth video.
I can not understand the timeline of this program, it is more complicated.
nvv4l2decoder in gst-launch-1.0 is not similar with that in video-viewer.

Here are the data files:
gst-4k.nsys-rep (3.5 MB)
gst-8k.nsys-rep (3.7 MB)

I have not used DeepStream yet. I will try to have a look on that.