NVDEC: batch video decoder kernels

vkhalidov1 · March 26, 2026, 10:00pm

Hello,

When running a video model training pipeline, I use GPU video decoding (decoding ~16 frames at constant FPS). Decoding is performed with torchcodec, which calls cuvidMapVideoFrame internally. This results in thousands of kernels ConvertNV12BLtoNV12 being launched in parallel (in a separate stream) to the compiled forward / backward model pass. The overall performance is similar or worse than when using CPU video decoding. It improves when reducing the number of decoder threads and the prefetch factor, which indicates SM pressure.

I’d like to know whether there are means of batching ConvertNV12BLtoNV12 kernels to eliminate kernel launch overhead. I don’t see such a possibility on the level of the API.

MarkusHoHo · March 31, 2026, 7:50am

HI @vkhalidov1, welcome to the NVIDIA developer forums.

I moved your post to CUDA programming for now, I think that is suited best for you specific question. If not, then Video Processing & Optical Flow - NVIDIA Developer Forums would be my next suggestion.

Thanks!

Topic		Replies	Views
How to decode multiple videos concurrently with NVENC? GPU-Accelerated Libraries	0	698	February 27, 2019
Sample AppDecMultiFiles in VideoCodec SDK does not improve the performance Video Processing & Optical Flow	1	733	December 6, 2019
The detailed decoding process using NVIDIA Video Codec Video Processing & Optical Flow	3	642	July 3, 2023
NVCUVID performance while decoding multiple videos at the same time CUDA Programming and Performance	0	1052	April 8, 2015
NVDEC How to multihread decoding for better performance (lower latency)? Video Processing & Optical Flow camera	3	2249	April 21, 2020
NVENC fastest convert (no Stream 0) GPU-Accelerated Libraries	4	1276	May 17, 2020
A100 Hardware NVJPEG Batch Decoding takes ~5ms before decoding and why GPU-Accelerated Libraries hw , cuda , kernel , ubuntu	11	1677	September 21, 2022
Utilizing CUDA cores with NVDEC Video Processing & Optical Flow	7	1455	October 12, 2021
do i need FFmpeg for the parsing the video for decoding to NVIDIA gpu or NVDEC can handel it CUDA Programming and Performance	0	397	March 25, 2019
Video_Codec_SDK_8.2.16 seems dot not work on CuvidDecoder General	4	973	October 12, 2021

NVDEC: batch video decoder kernels

Related topics