I was using OpenCV capture to get images from RTSP steam.
(using gstreamer rtspsrc and h265 decoder)
Before processing the data, I have to call cudaMemcpy to copy the data to device memory.
When resolution is 2K, this copy takes about 1ms.
When resolution is 4K, this copy takes about <4ms.
It is not very long.
But when resolution is 8K, this copy takes about 16ms, which is really a long time.
Since Jetson board has only one DRAM,
I wonder is there a way to capture images which can directly using by CUDA?
Or am I using OpenCV capture wrong?
Actually I am just using the gstreamer command like this for OpenCV capture: rtspsrc ! rtph265depay ! h265parse ! nvv4l2decoder ! nvvidconv ! video/x-raw,format=RGBA ! appsink
(with ! nvvidconv ! video/x-raw,format=RGBA, or it will be gray)
And then I call cudaMemcpy to get a CUDA pointer, and it takes 16ms when resolution 8K.
I have read the link you written.
I think this is for writing a gstreamer plugin.
And I just want to get GPU memory pointer, which will be used for processing images in normal C++ program without cudaMemcpy from Host to Device.
If OpenCV capture can not do this, I can use other libraries.
(I thought jetson-utils should solve this problem, but as I said in last post, it seems that there is bug in Jetpack 5.x)
@Up2U it had been since JetPack 5.0.0 Developer Preview since I had tried using NVMM with the V4L2 codecs, so you might want to try again on the latest JetPack 5.0.2 and comment out that #undef. I’m not sure if the API for using NVMM with nvv4l2decoder had changed or if it was actually a bug.
OK thanks, I will make a note to look into it again from my end. It actually looks like it may be a different error related to your 8K 10-bit format, as I have not seen that particular error before regarding the formats.
Ahh okay thank you @Up2U for noticing that, I will make a note to look into changing that and identifying what other potential side-effects that may have for code that is already using the videoSource::GetFrameRate() function.
@DaneLLL Hi, I found some questions after updating to 5.0.2.
The time of cudaMemcpy from Host to Device, which is needed after OpenCV capture, is obviously longer than 5.0.1.
For example, 8K image was about 16ms when using 5.0.1, but now is about 22ms.
4K image was about 4ms when using 5.0.1, but now is about 5ms.
And video-viewer in jetson-utils can not show 8K image now (capture a few frames then failed), meanwhile 5.0.1 was OK.
Hi @Up2U, is this 8K video source from a file, or from an RTP/RTSP stream? If it’s a network stream, I’m inclined to think the connection timed out or something, since it was successfully capturing a bunch of frames before. It would be interesting to know if it happens with a video file read from disk too.
@dusty_nv Hi, I have done some test on video-viewer with Nsight System.
And I found that status of plugin nvv4l2decoder is quite different:
When it is 4k, there is long time block ‘ppoll’ in every period.
Hi @Up2U, sorry I don’t have much insight into the nvv4l2decoder element, just the jetson-inference part, so you may want to create a new topic about that. Do you notice similar behavior if you run a standalone GStreamer pipeline with gst-launch-1.0 or with DeepStream? DeepStream is more optimized for high-bandwidth applications than jetson-inference is.
Thanks for your reply. @dusty_nv
I have also run gst-launch-1.0 both 4k & 8k. Both of them can not get a smooth video.
I can not understand the timeline of this program, it is more complicated. nvv4l2decoder in gst-launch-1.0 is not similar with that in video-viewer.