Hi, I am working with Jetson Xavier AGX. I am working with decoding the video and changing the Y values of every pixel. In my first case I decoded the sample.mp4 using the following pipeline:
gst-launch-1.0 filesrc location=sample.mp4 ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! fakesink.
For the above pipeline to complete it takes 3.7 seconds.
However when I introduce the nvivafilter with nvsample_cudaprocess.cu and change all the Y values for every pixel my pipeline completion time is 9.5 seconds. The pipeline for the second case is:
gst-launch-1.0 filesrc location=sample.mp4 ! qtdemux ! h264parse ! nvv4l2decoder enable-max-performance=1 ! nvvidconv ! ‘video/x-raw(memory:NVMM), format=(string)NV12’ ! nvivafilter cuda-process=true customer-lib-name=“libnvsample_cudaprocess.so” ! 'video/x-raw(memory:NVMM), format=(string)NV12" ! fakesink
All the filtering process is carried out in the GPU using CUDA process. Is there a way to reduce the pipeline completion time or what are the general ways to boost the GPU/CUDA processing to get the running time as low as possible ?