NvBuffer Transform much slower than OpenCV GPU

The execution is on hardware converter and for having optimal throughput, please run it at max clock:
This shall bring NvBufferTransform()/NvBufferComposite() in maxumum throughput. Besides, if you have multiple threads calling NvBufferTransform(), please create NvBufferSession