Best GPU for AI workloads (not DL training)

Hi everyone,
We would like to install in our lab server an nvida GPU for AI workloads such as DL inference, math, image processing, lin. algebra (not so much DL training). I was thinking about T4 due to its low power and support for lower precisions. But I’ve seen that the new RTX 3080,3090 have lower prices and high float performance. My questions are the following:

  • Do the RTX gpus have same support as the data center GPUs (like T4)? (i.e. as for CUDA X libraries)
  • Do the RTX gpus support INT8 or INT4 workloads?
  • What do you recommend for the server in my lab for these use-cases ? (budget ~2000 but prefer less)

Thanks in advance

With regard to point 2, the Ampere Whitepaper may provide useful comparisons to the T4 (Turing) Tensor unit performance: https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

Looking at: NVIDIA Tesla T4 Specs | TechPowerUp GPU Database vs. NVIDIA GeForce RTX 3090 Specs | TechPowerUp GPU Database though, the T4 would appear to offer double the FP16 throughput.

I assume you mean US$2000? What do you envision being included in that price? At current inflated prices due to massive shortages of GPUs, an RTX 3090 alone might cost that much, if you can get one.

You might want to use the online system configuration tools of various system vendors to get a better idea of what a server with a high-end CPU, plenty of system memory, and potentially SSD mass storage will cost. You are probably looking at total system cost including GPU of $5000 or more.

I think he wants to add the card to an existing server.

@dim.danopoulos

Another thing to consider is that the T4 card is intended for sale by system integrators, to be sold as part of a complete system. I don’t believe Nvidia support sale of these as standalone - they have no integrated cooling.

So if you are indeed just after a card and not a complete, certified system, the T4 is probably not the answer.

Agreed, looks like I had a senior moment there. I shall work on improving my reading comprehension.

Without knowing the specifics of the server configuration it is not really possible to know which GPUs will be suitable. For example, machines from major vendors may come with a PSU just sufficient for the configuration with which the system originally shipped (and may use a non-standard PSU form factor to boot), as I found out the hard way.

Thank you all for your immediate and valuable answers! So as far as I understand there is the same support in the RTX 3080 and T4 (as for libraries, Tensor ops, etc.).

  • You recommend the RTX because the T4 (as a passive card) might have heating issues in a non certified server? (while the RTX won’t)
  • Also, another question is if running i.e. cuFFTs on the RXT 3080 would be ideal. Even then would the card reach 320W or maybe 100W? Because my thought is even the RTX cards use a lot of power, medium workloads can “make” the cards energy efficent?

Generally speaking, the performance of large FFTs is limited by memory bandwidth. So check the GPU specs for that.

Passively-cooled GPUs are designed for servers: they need the correct amount of air flow across their heatsink fins, to be provided by the array of fans in the server. They also tend to have other requirements, e.g. of the system BIOS. NVIDIA sells these GPUs to system integrators and the support channel is the system integrator. If you buy a GPU-accelerated server from a system integrator on NVIDIA’s list, it should just work. If you cobble together your own system with a passively-cooled GPU, you may face various problems. We have had a fair number of posts to these forums reporting various issues, often due to insufficient cooling or system BIOS settings like large BAR.

I haven’t personally attempted to build a server with an actively-cooled consumer-grade GPU, so can’t speak to potential issues. I don’t specifically recall comments in these forums about issues with such a setup.

Thank you very much. I might look for T4 with a qualified server like the HPE DL380 Gen10 or something similar.

Can I ask what your thinking is regarding power consumption?

My initial thought was that your server PSU cannot handle the power requirements of an RTX3080/90, but your last comment indicates you are prepared to replace the server as well as purchase a T4.

If it’s just a “power consumed vs work done”, issue, unless I’m mistaken, each successive generation of GPU uses less power to perform the same task. So although an RTX 3090 may draw more power while running your task, the T4 will have to run the task for longer, on previous generation hardware and in the end will have consumed more watts and if you have purchased a new server and T4 card, spent a lot more money.

Just a thought :)

With the end of Moore’s Law the rate of efficiency improvement (GFLOPS/W) has slowed. The heuristic “newer generation GPU more efficient than previous generation GPU” generally holds when comparing GPUs at roughly the same place in the performance hierarchy. E.g. when comparing an A100 with a V100, or an RTX 2060 with an RTX 3060.

The T4 is specified with a power draw of 70W, which means it can get all its power from the PCIe slot. The RTX 3090 is specified with a power draw of 350W and requires two 8-pin auxiliary PCIe power connectors. The performance advantage of the RTX 3090 over the T4 seems to be never higher than 4.5x across all metrics. This suggests that the efficiency of the T4 should be slightly better, while absolute performance is significantly lower than the RTX 3090.

True. Also, unless the card is being used heavily, the idle power consumption of the 3090 is likely to be higher as well.

That’s likely but hard to quantify without measuring. From what I have seen, older NVIDIA GPUs typically have idle power draw of 7%-8% of TDP, while the newest ones may have idle power draw as low as 3%-4% of TDP! The power management turns off any component not needed, reduces all clock frequencies to a minimum, reduces the capabilities of the PCIe interface to a minimum, drops supply voltages to a minimum. Whether it turns off the fan on some actively-cooled cards I do not know. I am not seeing that on any of my GPUs (fans go to something like 30% when idle), but it seems possible in theory.

What helps the T4 with reducing power draw in general is lack of a fan, and the lack of a display connector. Because that interface needs to drive a high-frequency signal across a pretty long wire, it usually draws several watts, from my limited understanding of modern display interfaces. To my knowledge, a single GPU fan draws about 5 watts at full speed but that information is dated and may vary quite widely based on specifics of the fan design.

Also, I suspect for the case of the T4, several minutes of intense workload can cause thermal throttling to the card. Can it damage the card if I run a heavy workload i.e. for an hour? Also, do you have any values for the maximum operating temperatures? Because the 50C that T4 product brief says seems rather low to me.

From online reviews of the T4, thermal throttling is not uncommon under sustained heavy load, which is why the server needs to provide sufficient airflow. Apparently some don’t. I am not familiar with the T4 temperature ranges. I have never used one or even seen one, other than as a picture on a website.

Generally speaking, you cannot cause permanent damage by running a GPU hot [*]. Thermal throttling protects the GPU, and if that is not enough, thermal shutdown will occur. nvidia-smi can show the temperature limits for thermal throttling and thermal shutdown. They vary by GPU but in recent GPUs is is usually something like throttling starts around 80 deg C, shutdown happens at around 90 deg C. I operate one of my GPUs under an almost continuous thermal-throttling regime (due to fan management that doesn’t seem aggressive enough) and so far (1+ year) it works fine.

[*] Semiconductor components age through various physical phenomena such as electromigration in wires and hot carrier injection in CMOS transistors. These processes are accelerated when operating at higher temperatures. Some passive components such as capacitors also age due to chemical and physical processes. These are likewise accelerated by heat. With the exception of hot-running PSUs (power supplies) dying early, I am not personally aware of heat-induced component failures within a typical use time of five years (and for my personal systems, eight to nine years).

1 Like

Bear in mind, as long as the cooling requirements specified by Nvidia are met, the Tesla range of card are designed to be run at full load for most of their lives.

Comparing the T4 to the RTX2070 Super which uses the same GPU, the base/boost clock is 585/1590 vs 1605/1770 which is probably done with a view to longevity at load.

Later: The conservative clocks above may also have been chosen to keep the TDP at 70W, which is the limit for a PCIe slot with no auxiliary connector.

As a retired electronics tech, I’m a fan of cooling as much as possible and for cards that are being purchased to be used at full load for extended periods of time, believe watercooling is a worthwhile consideration. Not just from the reliability angle of lower temps, but in most cases the GPU will no longer be thermally throttled and so a significant performance boost results.

I have just added a waterblock to a GTX1080 in stock (not overclocked) form. Aircooled, running a task that can run for several hours, the clock is throttled down to 1609 within minutes. Water cooled it runs at 1911, a gain of 19%.

Admittedly there is extra cost and some maintenance involved, but once done, the cooling circuit can be applied to replacement cards. Now that some card manufacturers are offering solid watercooled options, the issue of voiding warranties or damaging a card by adding the block yourself is avoided and some are offering an extra year of warranty over aircooled models.

These are a nice example, if one looks past the flashing lights and lurid coolant colour: RTX3090-24G-EK|Graphics Cards|ASUS USA

As an aside, if you haven’t already seen them, here is a review on the T4: Analysis of Our NVIDIA Tesla T4 Review - ServeTheHome

an also an inferencing benchmark, showing relative performance between t4 and many other cards: NVIDIA GeForce RTX 3090 Review A Compute Powerhouse - Page 6 of 7

1 Like

I had a look at the document and the 50 deg C appears in table 4 under “Environmental and Reliability Specifications”. So the range stated there (0 deg C to 50 deg C) refers to the temperature of the air flowing past the GPU. Note that some servers may be specified for higher operating temperatures, e.g. limit of 60 deg C, in which case the T4 would become the component that limits how hot the server is allowed to run. If memory serves, there are also some Intel CPUs that must run relatively cool and can likewise impose a limit on server temperature.

1 Like

Yes you are right. I guess I overlooked it. Anyways thank you all for your important feedback!