I have a couple systems set up with 1080 Ti GPUs for neural net research. Unfortunate timing, in that the 2080 Ti became available shortly after.
Is there a major advantage in replacing these with 2080 Ti boards? Does the addition of Tensor Cores make a significant difference? I know that may be difficult to quantify, but there must be some benchmarks available.
Is it possible to combine 1080 Ti’s and 2080 Ti’s without conflicts? Is Tensorflow 2.0 able to use the Tensor Cores from the 2080 Ti, even when the 1080 Ti board is combined?
No comments yet, so I’m wondering…Is there a more appropriate forum for this particular question?
Tensor Cores can provide a 1.5-3x performance benefit. In order to realize this benefit, there are tree general requirements.
You need to train in mixed precision so that the computationally intensive matrix multiply and convolutions are computed in reduced precision. A float32 model can be converted to mixed precision by using a simple optimizer wrapper. https://www.tensorflow.org/api_docs/python/tf/train/experimental/enable_mixed_precision_graph_rewrite
In order to efficiently feed the tensor cores, certain layer dimensions in your model need to be chosen or padded to multiples of 8. Generally this applies to batch size, hidden layer dimension, input/output channel counts, vocabulary size, and sequence lengths.
You fp32 model needs to be computationally limited to begin with. If the floating point throughput isn’t the performance limiter to begin with, accelerating that throughput will have little effect. For example, you may be IO or CPU bound by you preprocessing pipeline or, for models with many tiny layers, gpu kernel launch latency bound.
Usually, with some tweaking and optimization, it is possible to get models to run well on Tensor Cores. See https://devblogs.nvidia.com/nvidia-automatic-mixed-precision-tensorflow/ for some more concrete examples.
Combining 1080 and 2080 GPUs is technically allowed, but in multi-GPU training, work is almost always split evenly between devices. As a result, the 2080 will end up waiting for the 1080 to complete each step and you would see the same performance as using 2x1080 GPUs.