Jetson Nano HW encoder/Decoder conflicting documentation

FCLC · February 8, 2021, 1:54am

Unfortunately the tonemapping filter I designed for that post/thread needs to take decoded footage as input before it can be used.

It seems that in CUDA 8 the ability to use CUDA cores to handle the encoding and decoding of content was available, but it’s been deprecated in more recent versions. They were nvcuvid and nvcuvenc but they haven’t been updated unfortunately.

From what I can tell, the language/SDK/API has no native mechanism in modern versions to fall back on CUDA in the case were a specific format is unsupported.

Now, in the case of discrete cards in a desktop/typical x86 machine, you’d fall back to CPU, but on the jetson, across different codecs/bitrates/resolutions using all software vs all hardware creates a literal order of magnitude difference in performance (and the jetson was in 5w mode as compared to NVMAX+ jetson Clocks to try and give every advantage to software handling)

As of now I see 3 solutions:

Patch the jetpack driver/firmware to allow the jetson to intake 10 bit into the decoder, with the knowledge that the hardware encoder will truncate the last 2 bit’s of every channel. You would not be able to leverage the increase quality of 10-bit input, but it would allow for some functionality. The API would probably throw a warning to console saying that it is an imperfect decode, but it’s better than nothing (I believe this would be the easiest solution, but would only apply to the nano)

A feature request is accepted and delivered reintroducing a more modern implementation of the CUDA based decoder that is then available (but must be strictly enabled by the program) as a graceful fallback when a specific profile/codec/codec sub specification is unsupported on the hardware in question. This would be useful on any and all cuda compliant devices at the known cost of lower per stream performance and/or higher power consumption for a given stream. From an eco system standpoint (and with the AV1 codec coming in soon) this would allow for an expansion of capability across all CUDA devices and provide the ability for developers to support a larger amount of devices without having to split workloads between host CPU and host GPU depending on different configurations.
The implementation would go from the current implementation of

// Check if content is supported
if (!decodecaps.bIsSupported){
NVDEC_THROW_ERROR(Codec not supported on this GPU", CUDA_ERROR_NOT_SUPPORTED);

To now being

// Check if content is supported in hardware
if (!decodecaps.bIsSupported){
//check if cuda compatibility decoder is allowed AND enabled
if (decode_cuda_enabled == false)
{
NVDEC_THROW_ERROR(Codec not supported on this GPU", CUDA_ERROR_NOT_SUPPORTED);
}
if (decode_cuda_enabled == true)
{
//something that remaps cuvidCreateDecoder() to a new funtcion call that implements a cuda based decoder as the decoder entry points. Perhaps cuvidCreateCudaDecoder()
printf(“Cuda compatibility decoder enabled, performance will be reduced!”);
}

Then the pipeline would continue as it normally would for the NVDEC API

This is by far the nost prefered scenario, as it enables greater compatibility and less headaches for any developers targeting a large diversity of user systems

Somehow put together a custom CUDA based encoder of my own specifically targeted for the Nano that is limited in feature set and is very application specific.

That’s my concern. Ideally If I could get the source/ or general outline for the CUDA 8 nvcuvid functions I could implement them and update them for other codecs