Can NVIDIA L40S effectively replace A16 for high-density 1080p H.264/H.265 NVENC transcoding?

We are evaluating NVIDIA GPUs for a production VOD transcoding system focused on H.264 and H.265 (HEVC) at 1080p resolution.

The primary requirement is high-density concurrent NVENC throughput, rather than GPU compute or AI workloads.

We are currently comparing NVIDIA A16 (high NVENC density) with NVIDIA L40S (fewer NVENC engines but higher compute and memory bandwidth).

While we understand NVIDIA does not publish fixed FPS benchmarks, we would appreciate architectural or practical guidance from engineers or users with real-world experience:

  • Can L40S reasonably substitute A16 for high-density 1080p H.264/H.265 VOD transcoding?

  • In practice, how does aggregate NVENC throughput of L40S compare to A16 under such workloads?

  • Are there specific scenarios where L40S becomes a better choice despite lower NVENC density?

Any insights, high-level comparisons, or real-world observations would be very helpful.

Hi there @vuongtrannguyenkhoi, welcome to the NVIDIA developer forums.

While I cannot give you real-life examples or share experience with your kind of setup, our docs do share some baseline performance comparisons that might be helpful.

For the latest Video Codec SDK and NVDEC you find it here: NVDEC Application Note - NVIDIA Docs
and for NVENC here: NVENC Application Note - NVIDIA Docs

Where the A16 is Ampere generation and L40S is Ada Lovelace geeration.

I hope this helps.