Question about nvv4l2decoder element

Hi,
For decoding 8K content on Orin 64GB, it should be able to achieve 30fps by running the command:

$ gst-launch-1.0 -v rtspsrc protocols=tcp location=rtsp://10.98.32.1/live_stream latency=0 ! rtph265depay ! h265parse ! nvv4l2decoder enable-max-performance=1 ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0

But the following command may not achieve 30fps:

$ gst-launch-1.0 -v rtspsrc protocols=tcp location=rtsp://10.98.32.1/live_stream latency=0 ! rtph265depay ! h265parse ! nvv4l2decoder enable-max-performance=1 ! nvvidconv ! video/x-raw,format=BGRx ! fpsdisplaysink text-overlay=0 video-sink=fakesink sync=0

Since there is additional memory copy in

... ! nvvidconv ! video/x-raw,format=BGRx

In the step, it converts decoded NVMM buffer(in I420) to BGRx, and then copy to CPU buffer. It occupies significant CPU loading and can be performance bottleneck.