NVDECs - I need more! (nVidia employees, please read)

Dear nVidia employees, this message is for you:

How do you determine which quadro GPU gets how many NVDEC units? In my humble view it does not make sense right now. One generation, Quadro RTX 3000 gets 3, RTX 4000 gets 2 and RTX 6000 gets one.
Next generation, “Quadro” RTX A4000 gets one and RTX A6000 gets 2.

From a integrated product development view, I would like to see some consistency. We can not plan for the future this way.

Why? Well, for the solution I am working on right now, it is not important how many CUDA cores the card has. Only the decoding performance is important. And you guys have it the best! If you can give me 5 NVDECs in one single slot card, I would be more than happy and it would solve all our problems.

So we were quite OK with RTX 4000 having 2 NVDECs as we could stack two of these in one 1U rack and everything worked reasonably well. And we were looking forward to the new generation and expected it to still have at least 2 NVDECs.

But someone has decided that the RTX A4000 will only get one. So why did you do that? Who and how does determine the count? Can I somehow affect the product line or ask for a custom GPU with unlocked units if the purchases would be in hundreds of cards? Please give me anything, I am starting to be quite desperate over the NVDEC count. And no, A100 is not the answer - too many CUDA cores, too much power consumption, too pricey and I still need an output in the end.

Thank You in advance :)

So a new low-profile desktop version of RTX A2000 came out… But it has 1 NVDEC instead of 2 NVDECs like on the laptop version. Again, Why? I don’t understand!

Hi @teslan223, it seems like the A16 could be the right pick for you. It has 4x GPU chipsets with 1x encoder and 2x encoders each, so total of 4x NVENC and 8x NVDEC on the board. See more details here A16 GPU : Take Remote Work to the Next Level | NVIDIA.

Hiello @gpolaillon, A16 is a very interesting piece and I can see this as a way to upgrade our systems to 4K. But for now this is far from ideal:

Yes, the price per NVDEC unit will probably be the same as Quadro RTX 4000 (circa 600-800 USD ???) However the overall performance is overkill, just like the overall price. I hope that you see the problem - There is not even a “good enough” choice somewhere in the middle like with the RTX 4000. I can either pay 4500+ USD for 8 NVDECs or 1300 for one if I want ampere.

If GA106 has 2x NVDECs built-in (apparently), the ideal scenario would be to enable it on desktop A2000, and (ideally, possibly?) make a classic single-slot version á la P2200.

Actually, A16 is 4x GA107, so almost 4x A2000 on one board (slightly less powerfull). So unless I’m getting something wrong, I don’t think this is an overkill, just a dense encoder/decoder board that we design for customers interested by pure encoding/decoding performance more than CUDA cores. That sounded like what you were looking for :). Yes pricing is slightly higher than 4x A2000 but that goes on a single board, so you could get higher server density overall. That’s the best we have for pure encoding/decoding workflow with less GPU performance/power.

Sorry for the mismatch, it is fairly hard to find official info (which chip is on which card) :)

Yes, there are a few problems, or missed points if you wish. I will try to explain, hopefully we’ll see an unerstanding:

  1. Not all products are in the cloud and only about processing. My solution is integrated and definitely needs an display output, and also additional SDI IN/OUTs.
  2. The point of the solution was to be a “budget” system - cheaper than what we had before. The rationale to even create the system was the existence of Quadros RTX 4000 and 3000 (ask dell ;)) - having multiple NVDECs for relatively cheap. We would not even try if these were not available. And we honestly counted on NVDECs in newer generations at that level increasing, not decreasing.
  3. The system has to be flexible - we need approx. 1 NVDEC (on frequency similar to RTX4000) per 3x1080p50 inputs for everything to work seamlessly (if you can not imagine why - playing at 4x the speed - both ways!). And this should be expendable to 12 inputs (where other limits are being encountered) - so 4-5 NVDECs is ideal for the top system. Every input has it’s final price.
  4. There is a little bit of OpenGL and a little bit of CUDA going on on top of that. Nothing that just a little bit over 2000 CUDA cores would not do. Even that is too much. When we need to buy second RTX 4000 sometimes, it’s performance other than decoding basically goes to waste.

As you can see, A16 definitely IS overkill (for me) and Quadro RTX4000 basically does not have a sufficient successor in Ampere generation within it’s price level. And explaining to our suppliers that A4000 - “a better card for the same price” has much less value for us is especially hard.

So what I am complaining about is the inter-generational inconsistency/deterioration and lack of imagination on NVIDIA’s side: I can almost hear somebody in the process saying “Pff, who would need 2 NVDECs in A4000, nobody would use that, why did we even do it in Turing?.. We will do a server GPU for video because who would need something like that for anything else than processing?..”

Sorry for taking your time reading this and hopefully, you understand the problem better now :)