I’m on Windows 10 x64 with CUDA 10 SDK and I use Nvenc API in my C++ program.
I’m encoding 4k frame (3840*2160).
I started to profile with Nsight Systems and I was surprised to see that encoder use (internal call) Stream 0 to convert input format to the desired one needed by encoder.
My goal is to remove any convertion when encoding frame (i.e no more Stream 0 used and no kernel used).
So I tested different input format (YUV444, IYUV, YV12 and NV12) and it seem that NV12 is the “fastest” format according to profiler it “only” launch 2 times the kernel named “Convert_PL2BL”.
I didn’t found a way to remove this call, I think it’s launched 2 time for Luma and Chroma.
I don’t know what is the purpose of this kernel?
I tested to input an image with continuous data, (i.e pitch data is equal to image width) but the encoder always call this kernel.
Is there a format I can input to encoder which will disable internal call to kernel for convertion?
@oamoros0ealf Thank you very much for your post, and your find!
In the past I just keep using the stream 0 for the project since no other solutions was found…
But be sure I will test this in a future project which will need a very optimized pipeline!