Combine 1080 Ti and 2080 Ti for Tensorflow-based apps

MaxV · October 28, 2019, 6:15am

I have a couple systems set up with 1080 Ti GPUs for neural net research. Unfortunate timing, in that the 2080 Ti became available shortly after.

Questions:

Is there a major advantage in replacing these with 2080 Ti boards? Does the addition of Tensor Cores make a significant difference? I know that may be difficult to quantify, but there must be some benchmarks available.

Is it possible to combine 1080 Ti’s and 2080 Ti’s without conflicts? Is Tensorflow 2.0 able to use the Tensor Cores from the 2080 Ti, even when the 1080 Ti board is combined?

MaxV · October 30, 2019, 6:29am

No comments yet, so I’m wondering…Is there a more appropriate forum for this particular question?

nluehr · November 13, 2019, 4:47pm

Tensor Cores can provide a 1.5-3x performance benefit. In order to realize this benefit, there are tree general requirements.

You need to train in mixed precision so that the computationally intensive matrix multiply and convolutions are computed in reduced precision. A float32 model can be converted to mixed precision by using a simple optimizer wrapper. https://www.tensorflow.org/api_docs/python/tf/train/experimental/enable_mixed_precision_graph_rewrite
In order to efficiently feed the tensor cores, certain layer dimensions in your model need to be chosen or padded to multiples of 8. Generally this applies to batch size, hidden layer dimension, input/output channel counts, vocabulary size, and sequence lengths.
You fp32 model needs to be computationally limited to begin with. If the floating point throughput isn’t the performance limiter to begin with, accelerating that throughput will have little effect. For example, you may be IO or CPU bound by you preprocessing pipeline or, for models with many tiny layers, gpu kernel launch latency bound.

Usually, with some tweaking and optimization, it is possible to get models to run well on Tensor Cores. See https://devblogs.nvidia.com/nvidia-automatic-mixed-precision-tensorflow/ for some more concrete examples.

Combining 1080 and 2080 GPUs is technically allowed, but in multi-GPU training, work is almost always split evenly between devices. As a result, the 2080 will end up waiting for the 1080 to complete each step and you would see the same performance as using 2x1080 GPUs.

Topic		Replies	Views
Question regarding Tensor Cores/GV100 CUDA Programming and Performance	8	2599	August 12, 2017
Some questions about using 10x0 and 20x0 cards for DL cuDNN	1	539	December 22, 2018
Is it possible to use cuda core and tensorcore concurrently ? Deep Learning (Training & Inference) mixed-precision	0	1653	October 13, 2019
Is GeForce RTX 2080 slower than GeForce GTX 1080 on small matrix-matrix multiplication? CUDA Programming and Performance	12	2724	October 25, 2018
2080ti vs Titan V CUDA Programming and Performance	16	5765	October 25, 2018
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	3941	November 7, 2010
Tips for Optimizing GPU Performance Using Tensor Cores Technical Blog	15	1103	July 24, 2019
Tesla compatibility CUDA Programming and Performance	13	15269	December 2, 2007
Is Tesla K40 clashing with K20 in a multi-GPU system? Teaching and Curriculum Support	0	1248	June 10, 2014
CUDA cores vs Tensor Cores Jetson AGX Xavier cuda , nvbugs	16	4848	October 18, 2021

Combine 1080 Ti and 2080 Ti for Tensorflow-based apps

Related topics