Graceful fallback to CUDA decoding when format not suported by dedicated hardware block

Hi everyone!

Looking for some guidance on implementing graceful fallback when a specific codec/profile is not supported on the users specific hardware.

I know this funtionality was supported in prior cuda versions (I can find documentation all the way to Cuda 8) but can’t find any modern references to it in more modern versions of CUDA.

For example, users of certain devices don’t have support for 10 bit HEVC decoding in hardware, so if detected I’d like to fall back to CUDA.

I realize this would involve a performance hit, but would still be faster than dealing with CPU decoding. It also makes supporting a larger base of hardware much easier.

Any guidance is very much appreciated!

Is this doc from the Video SDK V11.0 any use: NVDEC Video Decoder API Programming Guide :: NVIDIA Video Codec SDK Documentation

Unfortunately doesn’t look like it - it seems to only reference the hardware decoder or be ambiguous. Here for example NVDEC Video Decoder API Programming Guide :: NVIDIA Video Codec SDK Documentation

I just can’t seem to find confirmation one way or another. I don’t know why it wouldn’t exist anymore, but I’ve been burned before on this sort of thing

nvcuvid and nvcuvenc are deprecated (or dropped/unavailable) and certainly not recommended for any new work/designs.

nvdec/nvenc are the current technologies, and there is no “cuda fallback” provided by NVIDIA, currently.

Can you write your own CUDA based technology? Certainly. Obviously such an effort is considerably beyond the effort associated with using an existing plugin codec technology and as a practical matter, in my opinion no one assumes that is the sort of context you are asking about. Of course anything is possible with sufficient effort.

1 Like

Hi Robert,

In that case- for users that do not have access to certain profiles/codec versions is the expectation to fail/exit the application?

I think there are multiple possibilities. For example couldn’t you switch to another codec e.g. a CPU based codec, if you can’t find a GPU based codec that does what you want?

There is no “expectation” inherent in the technology. The “expectation” is whatever you decide it should be as a programmer. Fail/exit is one possible outcome. I’m sure there are other outcomes that could be arrived at, with sufficient programming effort.

Suppose you pull up to filling station in your ICE car. Suppose you discover that the regular unleaded variety of fuel is not available at that service station. What is the “expectation” at that point? I can see at least 2 possible avenues:

  • you could look for another service station that has regular unleaded fuel
  • you could switch to premium fuel (if it happened to be available at that service station)

Allow me to rephrase that- the standard behaviour if trying to use hardware acceleration where a user’s hardware so happens to not have a specific codec supported is to fail/exit. The technology/API does not currently have a native mechanism to fall back to using SM’s/ Cuda cores in the case where the requested codec, or version of a codec is unsupported.

The context behind this is related to an arm based compute cluster I’m working on. I’ve begun with a few jetson’s as a POC and then probably moving to Cadmium arm servers with discrete cards later on, or some other vendor.

Because Codecs can and are updated/expanded every few years, it would be very useful to be able to fall back to a secondary mechanism that is still accelerated rather than attempt to decode on CPU. If at all possible I’d like to be spending more of the chassis power budget on GPU’s than having to account for higher CPU resources, even when a large portion of the GPU hardware is idling.

For example, the relative performance loss of moving from a Maxwell era hardware Decode → encode chain for the same bitrate and same profile is an entire order of magnitude when compared to 4 arm cores @ 1.5 ghz. Testing methodology is at the end of the post.

What I’m looking for is something in between. Because we will always want to use the hardware encoder, processing on GPU is both faster since the hardware is better adapted/leveraged and it removes the need to move data over the PCIe bus, since everything is presumably already in video memory. This also better allows for CUDA based image processing for the same reason.

I’m aware that using the ASIC on board the cards would be ideal in most cases, but that unfortunately inst always possible

Testing done with a Jetson Nano 2gb and 4gb, across multiple files types, codecs bitrates etc. Testing resulted in a ~7% standard deviation across all scenarios.

For hardware encoding/decoding Nano was in 5W mode with only 2 cores active and Jetson clocks off.
For CPU encoding decoding/ Nano was in NVMAX mode with JetsonClocks on.

Perhaps you are making a feature request (it also sounds like a lot of work…). If you wish, you can file feature requests using the bug reporting method at the top of this sub forum in a sticky post. You may also get better help with Jetson questions by posting in the Jetson forums.

1 Like

Looks like the functionality no longer exists, so yes I do believe this would fall under a feature request. I’ve begun filling out the bug report form, but am not clear on how best to submit this correctly. Is their a specific way to submit a feature request?

Do you believe this would fall under ComputeWorks? and if so under “Nvenc” or “Other Cuda Tools”? Trying to be as precise as possible with the report!

Historically, the way to indicate a feature request when filing via the bug reporting form is to prefix the synopsis with “RFE:”. The form changes every few years, so best to take a closer look whether there is a dedicated “enhancement request” box or selection now that you can check.

1 Like