Configuring encoder Cuda streams, SDK 9.1

l.ayuso · October 1, 2019, 1:23pm

I am very interested in the Cuda stream support for my encoder tasks, but so far I haven’t managed to get it running.

There is very little documentation about the topic, I have found two mentions so far:

a little paragraph in the application note PDF:
Support for CUStream has been added in NVENCODE
API to enable execution of preprocessing and
postprocessing CUDA kernels on separate client
specified CUDA streams instead of default NULL
stream.
This results in better pipelining and improved
throughput when NVENCODE API is used along with
CUDA operations.
The header file itself:

Encoding may involve CUDA pre-processing on the input and post-processing on encoded output.
This function is used to set input and output CUDA streams to pipeline the CUDA pre-processing
and post-processing tasks. Clients should call this function before the call to
NvEncUnlockInputBuffer(). If this function is not called, the default CUDA stream is used for
input and output processing. After a successful call to this function, the streams specified
in that call will replace the previously-used streams.
This API is supported for NVCUVID interface only.
[…]

When I register my stream in this function, I get a plain segfault on the next nvEncEncodePicture call. Not even an error code, just a crash.

Thread 1 "encoder" received signal SIGSEGV, Segmentation fault.
0x00007ffff2c6120d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) bt
#0  0x00007ffff2c6120d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1  0x00007ffff2c7f454 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff2e3943d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff056e06d in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#4  0x00007ffff062c70c in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#5  0x00007ffff062775a in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#6  0x00007ffff061c67f in ?? () from /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
#7  0x00007ffff7955618 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1
#8  0x00007ffff79514b9 in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1
#9  0x00007ffff7960dcb in ?? () from /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1
#10 0x000055555560cf5d in encode_frame (this=0x555556e83f78, resources=..., output=std::vector of length 0, capacity 0) at *******/encoder.cpp:647

The questions are the following:

Is there better documentation for the new feature?
Does this feature work on when using RGB cuda buffers as input? the function comment only mentions NvEncUnlockInputBuffer, which is not the way we use the API.
Is this feature supported in Linux at all?
in a single configuration scenario, when is the right time to configure this flag? before encoding? after session generation? do I need to do it every frame?

Bonus questions:

can the input stream and the output stream be the same? I guess so as they are both default by default.
can one of the two streams remain to be the default stream?

mandar_godse · October 14, 2019, 11:43am

Hi l.ayuso,
Please have a look at AppEncCuda sample application in the Video Codec SDK 9.1.
This application accepts command line parameter ‘"-cuStreamType’ which demonstrates how to use CUDA streams for pre/post processing. Current usage in the sample application is implemented to work with ‘outputInVidMem’ set as 1. But, the API/feature itself can be implemented and used without ‘outputInVidMem’.

Let me know, if you have any further questions.

Thanks.

Scott.j.taylor · December 9, 2019, 8:31pm

I stumbled on this issue…
I believe the header documentation for nvEncSetIOCudaStreams API is incorrect, as i needed to dereference the input and output stream pointers to function without crashing. Example program AppEncCuda.cpp does this and then casts it to fit the function signature.

NVENCSTATUS NVENCAPI NvEncSetIOCudaStreams (void* encoder, NV_ENC_CUSTREAM_PTR inputStream, NV_ENC_CUSTREAM_PTR outputStream);

should be:

NVENCSTATUS NVENCAPI NvEncSetIOCudaStreams (void* encoder, NV_ENC_CUSTREAM_PTR* inputStream, NV_ENC_CUSTREAM_PTR* outputStream);

mandar_godse · December 27, 2019, 10:49am

Hi.

inputStream and outputStream are of type CUstream. NvEncSetIOCudaStreams() expects NV_ENC_CUSTREAM_PTR, hence their addresses are passed after typecasting. It is not clear to me where you need to dereference input and output stream pointers and what results in the crash you seeing.

Thanks.

l.ayuso · January 27, 2020, 4:02pm

After giving it a second try. I worked just fine. Just as mentioned, the API expects a pointer to stream and not the stream (which is a pointer itself).

The stream must be set after the nvEncInitializeEncoder (it is not enough to call it after nvEncOpenEncodeSessionEx, that will crash).

thanks for your time.