cuvidCreateDecoder return error CUDA_ERROR_OUT_OF_MEMORY

m.luht · September 16, 2025, 5:24pm

Hi everybody.
I am using hardware decoding of A100 and H100 graphics cards and trying to get all the computing resources of the NVDEC chip.
The H100 has 7 NVDEC chips, and the performance of each chip for H.264 with FullHD resolution is at least 903 FPS.

(This is performance for the Ada architecture, for some reason there is no performance for Hopper. Please tell me, where is this information?)

Thus, the decoding performance is 903*7=6321FPS. To decode video streams with 25 FPS, I need to create 6321FPS/25FPS=252 decoders. But I can’t do this because I only manage to create 245-247 decoders, the next call to the cuvidCreateDecoder function fails with the error CUDA_ERROR_OUT_OF_MEMORY.

How can I create even more decoders if my video streams are at 15FPS? In this case, I need to create 6321FPS/15FPS=421 decoders. In this case, half of the NVDEC chips’ resources will be idle?

To solve my problem, I read a lot of topics:

but they are all without a solution.

There is an answer that the number of decoders being created is limited by system resources, please tell me, what resources are these? They also claim that using NVML I can get some useful information that will tell me how many decoders I can create for a particular GPU. What kind of information is this?

I’ve used contexts in different ways. I created one context for all decoders, created a context for each decoder, created 4 contexts and distributed 70 decoders among them. Nothing solved my problem.

The only time I was able to increase the number of decoders I created was to run 4 separate processes to create them. In total, I managed to create 287-290 decoders.

This problem actually exists for any GPU.

To create so many decoders, I used NvDecoder.cpp/NvDecoder.h from the video codec sdk release examples.
I modified it a bit to achieve this number of decoders. It is necessary for all decoders to have a global mutex and use it when calling the cuvidCreateVideoParser and cuvidParseVideoData methods. Without such a fix, even fewer decoders can be created, about 160 pieces.

And this is my code for creating context, threads, and decoders:

void
process(CUcontext ctx, NvDecoder *dec, size_t num)
{
    ck(cuCtxSetCurrent(ctx));
    FFmpegDemuxer demuxer(cfg.file.c_str());

    int nVideoBytes = 0;
    int nFrameReturned = 0;
    int nFrame = 0;
    uint8_t *pVideo = NULL;
    uint8_t *pFrame = NULL;
    do {
        demuxer.Demux(&pVideo, &nVideoBytes);
        nFrameReturned = dec->Decode(pVideo, nVideoBytes);
        if (!nFrame && nFrameReturned)
            LOG(INFO) << dec->GetVideoInfo();
        for (int i = 0; i < nFrameReturned; i++) {
            pFrame = dec->GetLockedFrame();
            delete pFrame;
        }
        nFrame += nFrameReturned;
    } while (nVideoBytes);
    while (true)
        std::this_thread::sleep_for(std::chrono::seconds(600));
}

int
main(int argc, char **argv)
{
    ck(cuInit(0));

    try {
        cfg = scfg::init(argc, argv);
    } catch (std::exception &ex) {
        log_a("{}", ex.what());
        return 1;
    }

    CUcontext ctx = NULL;
    createCudaContext(&ctx, cfg.device_number, 0);

    std::vector<std::thread> threads;
    for (size_t i = 0; i < cfg.number; i++) {
        auto *dec = new NvDecoder(ctx, false, cudaVideoCodec_H264);
        threads.emplace_back([&ctx, dec, i] {
            try {
                process(ctx, dec, i);
            } catch (const std::exception &ex) {
                log_e("{}", ex.what());
            }
        });
    }

    for (auto &t : threads)
        t.join();

    return 0;
}

Topic		Replies	Views
cuvidCreateDecoder return error CUDA_ERROR_OUT_OF_MEMORY Video Processing & Optical Flow	0	52	October 17, 2025
GT 720 video decoder count limitation General	7	1202	October 12, 2021
GT 720 video decoder count limitation Video Processing & Optical Flow	0	629	April 27, 2021
cuvidCreateDecoder() count limit? Is there a count limit for calling cuvidCreateDecoder()? CUDA Programming and Performance	0	1550	May 24, 2010
CUDA Decoder API multi-stream limitation? CUDA Programming and Performance	3	3501	May 20, 2010
[Problem] About multiple CUDA decoder limitation CUDA Programming and Performance	0	2381	July 23, 2010
[Problem] About multiple CUDA decoder limitation CUDA Programming and Performance	0	700	July 23, 2010
cuvidCreateDecoder fails with CUDA_ERROR_OUT_OF_MEMORY? Video Processing & Optical Flow	3	2593	December 23, 2019
Number of simultaneous video decoders General Topics and Other SDKs	3	4996	April 4, 2017
How to decode multiple videos concurrently with NVENC? General Topics and Other SDKs	1	759	August 14, 2019

cuvidCreateDecoder return error CUDA_ERROR_OUT_OF_MEMORY

Related topics