Hi
My goal is to split 3 2160p@30fps videos into 12 1080p@30fps videos and share them these with 12 other processes.
But I’m running into performance issues after the 10th video. Till then it works perfectly.
I already invested a bunch of days into this issue so here’s what I already tried:
I set the VIC to max performance Nvvideoconvert issue, nvvideoconvert in DS4 is better than Ds5? - #3 by DaneLLL
I’m splitting the video by using gstreamer
filesrc location=/media/developer/hero ! matroskademux name=demux1 demux1. ! nvv4l2decoder ! video/x-raw(memory:NVMM),format=NV12 ! queue ! tee name=t1
t1. ! queue ! nvvidconv left=0 right=1920 top=0 bottom=1080 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink0 sync=true
t1. ! queue ! nvvidconv left=1920 right=3840 top=0 bottom=1080 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink1 sync=true
t1. ! queue ! nvvidconv left=0 right=1920 top=1080 bottom=2160 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink2 sync=true
t1. ! queue ! nvvidconv left=1920 right=3840 top=1080 bottom=2160 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink3 sync=true
t1. ! queue ! nvvidconv left=0 right=1920 top=0 bottom=1080 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink4 sync=true
t1. ! queue ! nvvidconv left=1920 right=3840 top=0 bottom=1080 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink5 sync=true
t1. ! queue ! nvvidconv left=0 right=1920 top=1080 bottom=2160 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink6 sync=true
t1. ! queue ! nvvidconv left=1920 right=3840 top=1080 bottom=2160 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink7 sync=true
t1. ! queue ! nvvidconv left=0 right=1920 top=0 bottom=1080 ! video/x-raw (memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink8 sync=true
t1. ! queue ! nvvidconv left=1920 right=3840 top=0 bottom=1080 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink9 sync=true
t1. ! queue ! nvvidconv left=0 right=1920 top=1080 bottom=2160 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink10 sync=true
t1. ! queue ! nvvidconv left=1920 right=3840 top=1080 bottom=2160 ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12 ! queue ! appsink name=appsink11 sync=true
the splitting works VERY well and I get above 100fps when sync is false. Awesome!
in the 12 gstreamer-appsinks (12 different threads) I use 12 different NVBufferSession to copy the buffers using NvBufferTransformEx.
The Buffer I’m copying to is set up the same width/height using NV12 and BlockLinear. The Buffer I’m copying to is getting reused.
This Copy operation seems quite expensive to me, because the framerate is now less than 75. Any idea on how to improve this rate?
What’s weird is, that when I transfer that filedescriptor to another process and copy it there using NvBufferTransformEx I get less than 70 fps. Although I use the exact same function.
I lose even more time when each child process is then encoding the stream:
appsrc name=mysource block=true
! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12,framerate=30/1
! queue
! nvv4l2h264enc maxperf-enable=1
! queue
! h264parse
! matroskamux
! filesink location=out sync=false
Because when I now push these resulting file descriptor to another gstreamer pipeline that encodes it, we have around 25fps and a very unreliable stream.
When I don’t do any NvBufferTransformEx I’m able to split to 12 1080p and encode to h264 at 90fps.
Things work best when I remove all NvBufferTransformEx and do all work in one process.
Things get worse when I add “appsink → NvBufferTransformEx → appsrc”. I can do 30fps, but it’s on the edge.
As soon as inter-process sharing comes into play the framerate drops even further to ~25fps.
So far all my research pointed to the NvBufferTransformEx. It’s “slow” copying buffers. and even slower between two processes, although each copy happens in it’s own thread, own NvBufferSession and buffers/sessions/filedescriptors are getting re-used. And the Buffer are getting set up with NV12 and BlockLayout.
Any other tips to improve the speed of NvBufferTransformEx? despite highest frequency?
Is there another way to share a video buffer between processes?
This thread is a follow-up from Split video into 4 smaller ones - #7 by qwertzui11
Any help would be greatly appreciated,
Cheers,
Markus