How to avoid Convert_PL2BL in NVENC?

Hi,

Due to driver issues in R440 and R450 Quadro driver versions, we need to stick with the driver branches R410 and R430. In those drivers, the higuest VideoCodec SDK version is 9.0.

For performance reasons, we need to avoid the execution of the kernel Convert_PL2BL, since it is breaking most of the overlaping (kernels and memory transfers) we build into our software.

Is there any way to directly pass to NVENC a Block Linear memory layout??? So we would do our own kernel, so that we can execute it in a non default stream.

Unfortunatelly, we can not use the new API methods in VC SDK 9.1, because the drivers that support it, crash under more than 50% load, in dual GPU systems. (we are working in a reproducer to prove that)

Thanks!