Hi all,
I am new to this forum and after searching its topics (and before that, the web), I could not find an answer to my questions. So, I decided to create this new thread.
I am trying to evaluate which is the best gfx card to use for CUDA programming when just single precision floating point operations are important.
First, I am going to list the features and pros & cons, I found out, then, I am going to ask questions I am in doubt. I hope, this is the proper forum for asking these questions and, more important, someone can reliably answer my questions ;)
The pros (+) and cons (-) I found out about Tesla K40, Quadro M6000, GeForce Titan X are as follows:
Tesla K40 - Features: 2880 Cores, 12GB Ram, 288 GB/s, 384-bit bus, Boost Clock: 810-875MHz, Core Clock: 745MHz, 6GHz GDDR5, 4.29 Tflops SP
Pros:
- pure compute capabilities (no video output)
- reliability (since produced and certified by NVIDIA, long-time warranty, strenuous long time zero error tolerance testing)
- Memory error protection (ECC)
- decent double precision performance (1.66 Tflops)
- Hyper-Q
- two DMA engines for bi-directional data copying (while a kernel is operating)
- TCC driver for Windows
Cons:
- “old” Kepler GPU architecture (a Tesla-Maxwell card will most likely not appear due to Maxwell´s poor DP performance)
- around 5000€
Quadro M6000 - Features: 3072 Cores, 12GB Ram, 317 GB/s, 384-bit bus, Boost Clock: 1140MHz, Core Clock: 988MHz, 6.6 GHz DDR5, 6.07 Tflops SP
Pros:
- reliability (since produced and certified by NVIDIA, uncertain: with tests and warranty equal to Tesla?)
- Memory error protection (ECC)
- two DMA engines for bi-directional data copying (while a kernel is operating)
- latest Maxwell-2 GPU architecture
- highly tuned (video?) driver for professional applications
- TCC driver for Windows
Cons:
- pure double precision performance (0.19 TFlops)
- around 6000 €
GeForce Titan X - Features: 3072 Cores, 12GB Ram, 336GB/s, 384-bit bus, Boost Clock: 1075MHz, Core Clock 1000MHz, 7GHz DDR5, 6.2 TFLOPS SP
Pros:
- two DMA engines for bi-directional data copying (while a kernel is operating)
- latest Maxwell-2 GPU architecture
- 1250 €
Cons:
- produced by 3rd party companies, not by NVIDIA
- poor double precision performance (0.192 TFlops)
- WDDM driver (Windows Display Driver Model) – might be not a problem when used as a secondary card without display?
Questions:
- Is it true that the Quadro M6000 indeed has “two DMA engines for bi-directional data copying (while a kernel is operating)”? I have not found reliable information about that.
(Was: As I understand it, Hyper-Q is important when using multiple GPUs in an CUDA compute environment. Is there something similar for Maxwell GPUs? Is it in any way related to SLI?)
Correction:
2) As I understand it, Hyper-Q is important when multiple CPUs/Processes access the same single GPU. Is there something similar for Maxwell GPUs? Is it true, that it is completely unrelated to SLI?
-
Is it true, that on windows, the WDDM problems disappear for a GeForce (Titan X) GFX card, when it is used as secondary card with no monitor connected. Can the Titan X be used with the TCC driver in such a case?
-
When only single precision floating point operations are important in hand-written CUDA programs or with CUDA related libraries like Thurst, cuBLASS, etc, does a Maxwell card like Quadro M6000 or GeForce Titan X achieve more performance than the Kepler card Tesla K40? (I am aware, that there is also a Tesla K80, which besically is a Tesla K40 with 2 GPUs. However, for a start, I just want to program on 1 GPU).
-
Is it possible at all to run a Quadro M6000 or Geforce Titan X in a server (not a desktop workstation) without a monitor? For develoement, it might make sense to start with a Titan X and later switch to a Tesla/Quadro when the product is mature enough.
Thanks in advance for your help!
EDIT: Applied some corrections to question 2) and to pros/cons of Titan X due to helpful responses in this thread.