NVENC fastest convert (no Stream 0)


I’m on Windows 10 x64 with CUDA 10 SDK and I use Nvenc API in my C++ program.
I’m encoding 4k frame (3840*2160).

I started to profile with Nsight Systems and I was surprised to see that encoder use (internal call) Stream 0 to convert input format to the desired one needed by encoder.

My goal is to remove any convertion when encoding frame (i.e no more Stream 0 used and no kernel used).

So I tested different input format (YUV444, IYUV, YV12 and NV12) and it seem that NV12 is the “fastest” format according to profiler it “only” launch 2 times the kernel named “Convert_PL2BL”.

I didn’t found a way to remove this call, I think it’s launched 2 time for Luma and Chroma.
I don’t know what is the purpose of this kernel?
I tested to input an image with continuous data, (i.e pitch data is equal to image width) but the encoder always call this kernel.

Is there a format I can input to encoder which will disable internal call to kernel for convertion?


Can anyone shed some light on the above doubt?

I think you’re more likely to get help with a question like this by asking it on the Video Codec forum:

[url]Video Processing & Optical Flow - NVIDIA Developer Forums

I added some info in this post, that can solve your issues.

@oamoros0ealf Thank you very much for your post, and your find!

In the past I just keep using the stream 0 for the project since no other solutions was found…
But be sure I will test this in a future project which will need a very optimized pipeline!

Thank you!

1 Like