What is Convert_PL2BL?

@dreqeu and @gmaxi17, or any one facing performance issues with NVENCODE due to internal pre-processing or post processing CUDA kernels.

In case you still haven’t found a solution to this problem, let me summarize:

Some time ago, NVENCODE started to use pre-processing and/or post processing CUDA kernels, inside the API calls. This kernels where using the default stream, which can cause big performance issues.

A solution to this would be to be able to specify your own cuda streams, for pre and post processing.

I found that starting with Video Codec SDK version 9.1, you can do exactly that.

NEW to 9.1 - Encode: CUStream support in NVENC for enhanced parallelism between CUDA pre-processing and NVENC encoding

In the Video Codec SDK examples, you can find how to use this new feature, in the file AppEncode/AppEncCuda,cpp.

You need to use NvEncoderCuda::SetIOCudaStreams(NV_ENC_CUSTREAM_PTR inputStream, NV_ENC_CUSTREAM_PTR outputStream) to set pre and post processing cuda streams. You can use the same or diferent streams. Notice that NvEncoderOutputInVidMemCuda inherits from NvEncoderCuda, so you have this public method abailable from an NvEncoderOutputInVidMemCuda instance. I did not review if there are other classes inheriting from NvEncoderCuda.

They are expecting cuStreams (CUDA Driver API type) but as far as I know it’s perfectly fine to cast a cudaStream_t created via the CUDA runtime API.

Hope it helps!

1 Like