Best GPU for AI workloads (not DL training)

dim.danopoulos · March 30, 2021, 7:55pm

Hi everyone,
We would like to install in our lab server an nvida GPU for AI workloads such as DL inference, math, image processing, lin. algebra (not so much DL training). I was thinking about T4 due to its low power and support for lower precisions. But I’ve seen that the new RTX 3080,3090 have lower prices and high float performance. My questions are the following:

Do the RTX gpus have same support as the data center GPUs (like T4)? (i.e. as for CUDA X libraries)
Do the RTX gpus support INT8 or INT4 workloads?
What do you recommend for the server in my lab for these use-cases ? (budget ~2000 but prefer less)

Thanks in advance

rs277 · March 30, 2021, 9:13pm

With regard to point 2, the Ampere Whitepaper may provide useful comparisons to the T4 (Turing) Tensor unit performance: https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

Looking at: NVIDIA Tesla T4 Specs | TechPowerUp GPU Database vs. NVIDIA GeForce RTX 3090 Specs | TechPowerUp GPU Database though, the T4 would appear to offer double the FP16 throughput.

njuffa · March 30, 2021, 11:40pm

I assume you mean US$2000? What do you envision being included in that price? At current inflated prices due to massive shortages of GPUs, an RTX 3090 alone might cost that much, if you can get one.

You might want to use the online system configuration tools of various system vendors to get a better idea of what a server with a high-end CPU, plenty of system memory, and potentially SSD mass storage will cost. You are probably looking at total system cost including GPU of $5000 or more.

rs277 · March 30, 2021, 11:46pm

I think he wants to add the card to an existing server.

@dim.danopoulos

Another thing to consider is that the T4 card is intended for sale by system integrators, to be sold as part of a complete system. I don’t believe Nvidia support sale of these as standalone - they have no integrated cooling.

So if you are indeed just after a card and not a complete, certified system, the T4 is probably not the answer.

njuffa · March 31, 2021, 12:34am

Agreed, looks like I had a senior moment there. I shall work on improving my reading comprehension.

Without knowing the specifics of the server configuration it is not really possible to know which GPUs will be suitable. For example, machines from major vendors may come with a PSU just sufficient for the configuration with which the system originally shipped (and may use a non-standard PSU form factor to boot), as I found out the hard way.

dim.danopoulos · March 31, 2021, 9:36am

Thank you all for your immediate and valuable answers! So as far as I understand there is the same support in the RTX 3080 and T4 (as for libraries, Tensor ops, etc.).

You recommend the RTX because the T4 (as a passive card) might have heating issues in a non certified server? (while the RTX won’t)
Also, another question is if running i.e. cuFFTs on the RXT 3080 would be ideal. Even then would the card reach 320W or maybe 100W? Because my thought is even the RTX cards use a lot of power, medium workloads can “make” the cards energy efficent?

njuffa · March 31, 2021, 10:04am

Generally speaking, the performance of large FFTs is limited by memory bandwidth. So check the GPU specs for that.

Passively-cooled GPUs are designed for servers: they need the correct amount of air flow across their heatsink fins, to be provided by the array of fans in the server. They also tend to have other requirements, e.g. of the system BIOS. NVIDIA sells these GPUs to system integrators and the support channel is the system integrator. If you buy a GPU-accelerated server from a system integrator on NVIDIA’s list, it should just work. If you cobble together your own system with a passively-cooled GPU, you may face various problems. We have had a fair number of posts to these forums reporting various issues, often due to insufficient cooling or system BIOS settings like large BAR.

I haven’t personally attempted to build a server with an actively-cooled consumer-grade GPU, so can’t speak to potential issues. I don’t specifically recall comments in these forums about issues with such a setup.

dim.danopoulos · March 31, 2021, 1:37pm

Thank you very much. I might look for T4 with a qualified server like the HPE DL380 Gen10 or something similar.

rs277 · April 1, 2021, 4:36am

Can I ask what your thinking is regarding power consumption?

My initial thought was that your server PSU cannot handle the power requirements of an RTX3080/90, but your last comment indicates you are prepared to replace the server as well as purchase a T4.

If it’s just a “power consumed vs work done”, issue, unless I’m mistaken, each successive generation of GPU uses less power to perform the same task. So although an RTX 3090 may draw more power while running your task, the T4 will have to run the task for longer, on previous generation hardware and in the end will have consumed more watts and if you have purchased a new server and T4 card, spent a lot more money.

Just a thought :)

njuffa · April 1, 2021, 6:26am

With the end of Moore’s Law the rate of efficiency improvement (GFLOPS/W) has slowed. The heuristic “newer generation GPU more efficient than previous generation GPU” generally holds when comparing GPUs at roughly the same place in the performance hierarchy. E.g. when comparing an A100 with a V100, or an RTX 2060 with an RTX 3060.

The T4 is specified with a power draw of 70W, which means it can get all its power from the PCIe slot. The RTX 3090 is specified with a power draw of 350W and requires two 8-pin auxiliary PCIe power connectors. The performance advantage of the RTX 3090 over the T4 seems to be never higher than 4.5x across all metrics. This suggests that the efficiency of the T4 should be slightly better, while absolute performance is significantly lower than the RTX 3090.

rs277 · April 1, 2021, 7:40pm

True. Also, unless the card is being used heavily, the idle power consumption of the 3090 is likely to be higher as well.

njuffa · April 1, 2021, 7:54pm

That’s likely but hard to quantify without measuring. From what I have seen, older NVIDIA GPUs typically have idle power draw of 7%-8% of TDP, while the newest ones may have idle power draw as low as 3%-4% of TDP! The power management turns off any component not needed, reduces all clock frequencies to a minimum, reduces the capabilities of the PCIe interface to a minimum, drops supply voltages to a minimum. Whether it turns off the fan on some actively-cooled cards I do not know. I am not seeing that on any of my GPUs (fans go to something like 30% when idle), but it seems possible in theory.

What helps the T4 with reducing power draw in general is lack of a fan, and the lack of a display connector. Because that interface needs to drive a high-frequency signal across a pretty long wire, it usually draws several watts, from my limited understanding of modern display interfaces. To my knowledge, a single GPU fan draws about 5 watts at full speed but that information is dated and may vary quite widely based on specifics of the fan design.

dim.danopoulos · April 1, 2021, 8:04pm

Also, I suspect for the case of the T4, several minutes of intense workload can cause thermal throttling to the card. Can it damage the card if I run a heavy workload i.e. for an hour? Also, do you have any values for the maximum operating temperatures? Because the 50C that T4 product brief says seems rather low to me.

njuffa · April 1, 2021, 8:15pm

From online reviews of the T4, thermal throttling is not uncommon under sustained heavy load, which is why the server needs to provide sufficient airflow. Apparently some don’t. I am not familiar with the T4 temperature ranges. I have never used one or even seen one, other than as a picture on a website.

Generally speaking, you cannot cause permanent damage by running a GPU hot [*]. Thermal throttling protects the GPU, and if that is not enough, thermal shutdown will occur. nvidia-smi can show the temperature limits for thermal throttling and thermal shutdown. They vary by GPU but in recent GPUs is is usually something like throttling starts around 80 deg C, shutdown happens at around 90 deg C. I operate one of my GPUs under an almost continuous thermal-throttling regime (due to fan management that doesn’t seem aggressive enough) and so far (1+ year) it works fine.

[*] Semiconductor components age through various physical phenomena such as electromigration in wires and hot carrier injection in CMOS transistors. These processes are accelerated when operating at higher temperatures. Some passive components such as capacitors also age due to chemical and physical processes. These are likewise accelerated by heat. With the exception of hot-running PSUs (power supplies) dying early, I am not personally aware of heat-induced component failures within a typical use time of five years (and for my personal systems, eight to nine years).

rs277 · April 1, 2021, 8:34pm

Bear in mind, as long as the cooling requirements specified by Nvidia are met, the Tesla range of card are designed to be run at full load for most of their lives.

Comparing the T4 to the RTX2070 Super which uses the same GPU, the base/boost clock is 585/1590 vs 1605/1770 which is probably done with a view to longevity at load.

Later: The conservative clocks above may also have been chosen to keep the TDP at 70W, which is the limit for a PCIe slot with no auxiliary connector.

As a retired electronics tech, I’m a fan of cooling as much as possible and for cards that are being purchased to be used at full load for extended periods of time, believe watercooling is a worthwhile consideration. Not just from the reliability angle of lower temps, but in most cases the GPU will no longer be thermally throttled and so a significant performance boost results.

I have just added a waterblock to a GTX1080 in stock (not overclocked) form. Aircooled, running a task that can run for several hours, the clock is throttled down to 1609 within minutes. Water cooled it runs at 1911, a gain of 19%.

Admittedly there is extra cost and some maintenance involved, but once done, the cooling circuit can be applied to replacement cards. Now that some card manufacturers are offering solid watercooled options, the issue of voiding warranties or damaging a card by adding the block yourself is avoided and some are offering an extra year of warranty over aircooled models.

These are a nice example, if one looks past the flashing lights and lurid coolant colour: RTX3090-24G-EK｜Graphics Cards｜ASUS USA

As an aside, if you haven’t already seen them, here is a review on the T4: Analysis of Our NVIDIA Tesla T4 Review - ServeTheHome

an also an inferencing benchmark, showing relative performance between t4 and many other cards: NVIDIA GeForce RTX 3090 Review A Compute Powerhouse - Page 6 of 7

njuffa · April 1, 2021, 8:41pm

I had a look at the document and the 50 deg C appears in table 4 under “Environmental and Reliability Specifications”. So the range stated there (0 deg C to 50 deg C) refers to the temperature of the air flowing past the GPU. Note that some servers may be specified for higher operating temperatures, e.g. limit of 60 deg C, in which case the T4 would become the component that limits how hot the server is allowed to run. If memory serves, there are also some Intel CPUs that must run relatively cool and can likewise impose a limit on server temperature.

dim.danopoulos · April 1, 2021, 8:45pm

Yes you are right. I guess I overlooked it. Anyways thank you all for your important feedback!

Topic		Replies	Views
T4 thermal integration GPU - Hardware	5	8009	February 12, 2020
Dual CPU AM3 motherboard for 4 Tesla C1060s? CUDA Programming and Performance	33	11582	April 23, 2010
Controling fan speed of Titan and TitanX with TCC enabled CUDA Programming and Performance	15	5254	December 5, 2022
GTX 590 CUDA power tests CUDA Programming and Performance	40	10111	January 29, 2012
Limited clock for the new RTX3090Ti + Ubuntu 20.04 CUDA Programming and Performance	15	3063	December 5, 2022
Problems after inserting a P100 CUDA Setup and Installation	35	3861	October 31, 2021
Hardware setup for a multi-GPU RTX 30xx -- based system CUDA Programming and Performance hw	11	1819	November 17, 2020
M2050 cooling Passive cards in a non-server case CUDA Programming and Performance	16	10480	June 28, 2017
Any future problems running GPUs for 12+ hours at a time? running cards for long periods of time whi CUDA Programming and Performance	32	5833	September 24, 2010
Which NVIDIA GPUs are more suitable for high-performance computing? CUDA Programming and Performance	33	2530	November 13, 2024

Best GPU for AI workloads (not DL training)

Related topics