I have been looking up to build a computing platform from the Fermis.
The issue now arises to decide between the GTX590 and Tesla C2070.
I fail to understand what tesla can achieve that the GTX 590 can not except for the support of greater memory.
here is a basic refresher
GTX 590(from manufacturers website)
1024 CUDA Cores
630 MHz GPU
Memory Clock (MHz) 1707MHz
3072MB (1536MB per GPU)Standard Memory Config
768-bit (384-bit per GPU)Memory Interface Width
331 Memory Bandwidth (GB/sec)
Tesla C2070
448 CUDA cORES
FREQUENCY OF CUDA CORES 1.15 Ghz
MEMORy SPEED 1.5 GHz
384 bit memory interface
144 Gbps Bandwidth
Questions
The nvidia documentation shows 1.15 Ghz as operating frequency for CUDA cores for tesla 2070.Is it the analogous specification to the 630 Mhz for the GTX 590.And if it is how come such a great difference between cards of similar architecture External Image
2)Apart from that and max memory support ,everything turns out to be in favour of GTX 590(4 of them will be used, liquid cooled,so i presume the memory issue is addressed).
3)Being a GF110 does GTX 590 also support ECC?
4)Where do Quadros stand in midst of all these?
Any light on these doubts that have been haunting me for quiet some time will be highly and overly appreciated External Image
But as this goes towards my roject research i cant help but question everything there is so plz bear with me.
The post does shower a lot of info on the differences between tesla 20 series and GTX 480. But then isint GTX 480 supposed to be based on GF100 where as the 590 on GF110.
What I ask from the pros here is that is the post as relevant to 590 as it it is to 480?
The only difference between the GF100 and GF110 is power usage (improved), and the maximum number of multiprocessors per GPU (16, up from 15). Keep in mind that the GTX 590 is two GPUs (each with 512 CUDA cores), and is programmed as two separate CUDA devices with no shared resources. You must manually partition your work between the two devices. The fastest single CUDA device is still the GTX 580.
Regarding your original questions:
The clocks you are comparing between GeForce and Tesla are not the equivalent. The clock to look at on the GeForce cards is sometimes called the “shader clock”. For the GTX 590, that clock is 1.215 GHz. I find this Wikipedia page to be a better reference for comparing NVIDIA GPUs than most of the product pages out there:
For single precision arithmetic and overall memory bandwidth, the GeForce cards are superior to the Tesla cards. In all the other features mentioned in the link tera posted, the Tesla is better. If you don’t need those features (I don’t), then the GeForce is the best card for the money.
No, the GeForce cards do not support ECC. The Tesla features mentioned on the page tera linked to are exclusive to the Tesla cards, despite both GeForce and Tesla cards using basically the same GPU. NVIDIA is trying to segment the market by keeping the more advanced features specific to Tesla.
Quadros and Teslas are very similar, but Quadros are designed for “professional 3D workstations.” (Frankly, that’s an additional layer of market segmentation completely out of my realm of experience.) A Quadro costs more than the equivalent Tesla card, which already costs quite a bit more than the nearest GeForce card. For scientific computing, the only sensible choice is between Tesla and GeForce.
Not to mention the fact that GF110 can do 64bit/FP16 filtering at full-speed compared to half-speed for GF100. Z-culling was also significantly improved with GF110.
your reply was much more enlightening than my googling and going thru white papers and spec sheets which answer a jargon with another.
A couple more questions, shower some info if the gods here may:
With GTX 580 being around for quiet some time now how come theres only one Tesla Solution(M2090) on the GF110 with 512 cores, considering the fact that Fermi is being shown to benifit Tesla the most.
Something tells me that with more CUDA cores per SM, more SFUs, more dispatch units(don’t know what they mean though), the 114 is better than 110?(ofcourse only after 114 is scaled to the level of having 512 cores and 384+ bit memory bus)
3)I will have to compulsorily liquid cool every graphic card, so in that case does Nvidia’s quality assurance and warranty hold any good for Teslas or other cards for that matter?
With so many drivers and applications that Nvidia provides to support clustering of Tesla computers, am I on a downside clustering with GTX 590s(ECC and exorbitant memory not primary requirement)
Do the specifications like texture fill rate, Rop’s display no correlation with the compute capability of cards?
The Teslas have longer product life cycles and get more thorough testing, so Nvidia probably did not see a reason to introduce more different SKUs with basically the same performance.
For CUDA purposes, GF114 is inferior to GF110 (even when scaling to the number of cores) as it is more difficult to keep 48 cores busy from two warps than 32 cores. So on the GF114 a higher fraction of the cores will be idle.
My guess is that you only get warranty if the water cooling is set up by a authorized Tesla solution provider, whom you will have to contact about this.
Under Linux there isn’t as much of a difference as under Windows.
There certainly is some correlation, but GPUs of the same compute capability can still have different specs.
Thank you Tera, looks like a couple more days here and i’ll be a master too(given similar responses from members External Image )
Having gone through the technical part of it now the time comes to decide.
The cuda rigs(clustered) will be used for
Running Linpack benchmark. (The sponsor company wants to show it off so not our call there)
2)Running immersive automobile training simulations.(Guess Geforce’s single DMA channel should do the work here moreover giving it and edge above Quadro/tesla after all its also a video game right :D)
Crypt analysis which I presume,from my limited knowledge will benefit from Teslas 2 DMA channels(correct me here)
million dollar question ; given the above requirements 4GTX 590 or 4 Tesla C2050 ?
The rig will be liquid cooled and performance is of utmost importance.
As Seibert said, if you do not use DP math than stick with GForce. My hands on experience is that there isn’t single reason for considering C10x0 Tesla any more. It is beaten hands down by GTX 580.
I hope to have soon direct comparison M2090 vs GTX 580 in very heavy DP calculus.
Even if results prove to be similar in comparison M2090 vs GTX 580, I would still go to 2090 for several reasons.
ECC - it is important in my case
Tesla drivers - rdp environment/driver and no need for tweaks to make code to run as service
reliability
support
Regarding $1M question: I would go with 4xGTX 590 - 8 GPUS, are always better than 4. Things that you have to consider in case going with GTX are that only remote CUDA access is through the VNC. Your code won’t be able to run as service without tweaks, no ECC and the most important reliability of GTX 590 cards for constant heavy load is out of question by my standards - simply not acceptable. Do some research about GTX 590 failure rates and take into consideration that your system will be more stressed in terms of thermal and computational stress than those single card gaming rings.
Sorry I was in a hurry and didn’t elaborate this as I should.
If you have CUDA enabled device in some machine and want to access that machine from remote location (i.e. to execute code from command prompt or do some remote development on that box); you won’t be able to use Remote Deskop for such purposes in case of GTX card. Reason for this is that Microsoft RDP driver is loaded instead of nV driver during Remote Desktop session and therefore no CUDA capable devices are found. In case of Tesla cards its driver is loaded during RDP session and you can use your CUDA enabled (Tesla) hardware normally.
In case of GTX cards you will have to use VNC connection on Windows box to be able to use VS for development or run programs from cmd line.
Also Tesla drivers allows your code to be executed as a service on Windows platform without any workarounds.
I do not have any experience with Linux and OS X environment to provide you details about them.
Life would have been better had thee been C2090 for the bechmarking
A few more questions(Although they may not be falling under any acceptable order of logic External Image )
In my previous post I mentioned about Tesla’s 2 DMA engines vs Geforce’s 1, hoping someone to throw some light on the difference that it would make.Anyone?please.
About running the real life simulation which would require stereoscopic displays will my rig of 4*Tesla 2050(If Tesla Emerges as winner;GTX 590 leading for now External Image be able to support that(stereoscopy in particilar;it being a part of geforce and quadro series)
What difference will it make running Cuda with and without SLI (4GTX 590 or 4Tesla 2050)? I read this
2 DMA engines do not make a big difference, as the same result can be achieved by running a kernel to transfer the data. This of course assumes that you write your software yourself or the original author has come across this.
Should make no difference, but I have no experience in this field.
SLI makes absolutely no difference to CUDA. There was a time when SLI needed to be switched off for all devices to be available (and your quote seems to stem from that time), but this is long history.