Tesla 20-Series Features and Advantages

So with all this nice new tech rolled in, it’s all the more disappointing to have 3/4 of the performance capped on the consumer cards.

Martin

bringing this message to the top, since this question keeps coming up in the forums.

If you are going to bump the topic, would it be too much to ask if you would address the questions that were raised here?

Regarding to the DMA engine, I did a test recently and found GTX480 can do bi-directional communication

but C2050 cannot. Do you have any insights on this?

http://forums.nvidia.com/index.php?s=&…t&p=1073297

to copy it here

$ ./concur_bandwidth  0

device 0: GeForce GTX 480

Device 0 took 3000.489502 ms

Test 1: Aggregate HtoD bandwidth in MB/s: 5995.058594

Device 0 took 3006.603027 ms

Test 2: Aggregate DtoH bandwidth in MB/s: 6621.408203

Device 0 took 2995.593994 ms

Test 3: Aggregate bidirectional per GPU bandwidth in MB/s: 11184.810547

$ ./concur_bandwidth  1

device 1: GeForce GTX 280

Device 1 took 2999.640137 ms

Test 1: Aggregate HtoD bandwidth in MB/s: 5995.058594

Device 1 took 3000.135498 ms

Test 2: Aggregate DtoH bandwidth in MB/s: 5860.841309

Device 1 took 2978.960693 ms

Test 3: Aggregate bidirectional per GPU bandwidth in MB/s: 5905.580078

$ ./concur_bandwidth 0

device 0: Tesla C2050

Device 0 took 3006.502441 ms

Test 1: Aggregate HtoD bandwidth in MB/s: 6129.276855

Device 0 took 2990.946533 ms

Test 2: Aggregate DtoH bandwidth in MB/s: 5681.883789

Device 0 took 2988.590332 ms

Test 3: Aggregate bidirectional per GPU bandwidth in MB/s: 6889.844238

Thats what I also say the Tesla 20 series are more capable than the current cards but with a high price tag of 2500 euros they have compute Capability 2.0 and abundant memory 16GB GDDR5. So they are better than a Fermi card. This is why I believe they take their time to create a Tesla Fermi card. I do not know any details I have no connection with Nvidia to know any kind of specs.

well, to begin with, clocks are lowered to decrease thermal failure rates, therefore bandwidth and TFLOPs take a hit.

how’s that “more capable” or “more tested”? more tested to select chips for lowering their frequency?

I have no inside information and it’s only my suspicion, but I certainly doubt that any additional

testing or treatment beyond lowering the clocks takes place on teslas. how do you convince me otherwise?

where are test results on reliability of tesla vs fermi?

nvidia should explain how it is that gtx 280 (especially factory-overclocked such as I have) is almost as fast in single precision as

gtx 480 (that I also have). nvidia’s official data sheets say 0.993 FT on old cards, 1.03 FT single prec. on new tesla architecture

(compute cards). so where’s the progress?

per one cuda processor, there is now less bandwidth! most codes are bandwidth-limited, it’s hard to to do dozens artimetic

oprations on one float before returning it to global memory. improvement? where?

gtx 295 had dual gpus and a better price-performance than fermis. again, where is the progress?

disabling 3/4 dp units for marketing reasons on gtx 480… and slapping a 5 times higher list price on C2050 ($2500 vs $500),

a card with slower clocks than gtx 480, what sense does it make? I’ll buy 5 x more cards and even if there is some truth

about smaller reliability of fermis, I’ll be much better off (in sp).

I’m mad because I’d like to have both a better single prec. price/performance and better dp capability, but now I have the dilemma…

In my personal view, having both the old and new architecture in my computers, the new one is overall sort-of 30% better

and not really cheaper. this needs to be compared with the boasts of nvidia leaders a number of years ago, where they

were predicting huge financial problems for the competitor (amd) and said they’ll aim at doubling the computational power

every year! no such thing ever happened. fermi cards are not significantly better than gt200 cards for numerical applications in single precision

(most science can be done in single precision, including cfd).

c++ on the device is nice but not an absolute must for scientific programmers.

personally, right now I’d love to be able to buy a bunch of gtx 295s at, say, $350-400, since the tesla price tag forces me out of double precision area anyway. however, 295s are totally gone and you can’t even find them second-hand on ebay.

all in all, nvidia pr machine should be congratulated on a pretty good job. for a while I was overwhelmed with how much better gtx 480 will be than

gtx 280, until the dp issue emerged and tesla cards were revealed to be essentially gtx470s with enabled dp units and one more dma engine (which in practice people don;t always immediately see working, as this thread documents).

I think the answer to the question: why should I buy tesla and not gforce is “maybe you shouldn’t !!!”

well, to begin with, clocks are lowered to decrease thermal failure rates, therefore bandwidth and TFLOPs take a hit.

how’s that “more capable” or “more tested”? more tested to select chips for lowering their frequency?

I have no inside information and it’s only my suspicion, but I certainly doubt that any additional

testing or treatment beyond lowering the clocks takes place on teslas. how do you convince me otherwise?

where are test results on reliability of tesla vs fermi?

nvidia should explain how it is that gtx 280 (especially factory-overclocked such as I have) is almost as fast in single precision as

gtx 480 (that I also have). nvidia’s official data sheets say 0.993 FT on old cards, 1.03 FT single prec. on new tesla architecture

(compute cards). so where’s the progress?

per one cuda processor, there is now less bandwidth! most codes are bandwidth-limited, it’s hard to to do dozens artimetic

oprations on one float before returning it to global memory. improvement? where?

gtx 295 had dual gpus and a better price-performance than fermis. again, where is the progress?

disabling 3/4 dp units for marketing reasons on gtx 480… and slapping a 5 times higher list price on C2050 ($2500 vs $500),

a card with slower clocks than gtx 480, what sense does it make? I’ll buy 5 x more cards and even if there is some truth

about smaller reliability of fermis, I’ll be much better off (in sp).

I’m mad because I’d like to have both a better single prec. price/performance and better dp capability, but now I have the dilemma…

In my personal view, having both the old and new architecture in my computers, the new one is overall sort-of 30% better

and not really cheaper. this needs to be compared with the boasts of nvidia leaders a number of years ago, where they

were predicting huge financial problems for the competitor (amd) and said they’ll aim at doubling the computational power

every year! no such thing ever happened. fermi cards are not significantly better than gt200 cards for numerical applications in single precision

(most science can be done in single precision, including cfd).

c++ on the device is nice but not an absolute must for scientific programmers.

personally, right now I’d love to be able to buy a bunch of gtx 295s at, say, $350-400, since the tesla price tag forces me out of double precision area anyway. however, 295s are totally gone and you can’t even find them second-hand on ebay.

all in all, nvidia pr machine should be congratulated on a pretty good job. for a while I was overwhelmed with how much better gtx 480 will be than

gtx 280, until the dp issue emerged and tesla cards were revealed to be essentially gtx470s with enabled dp units and one more dma engine (which in practice people don;t always immediately see working, as this thread documents).

I think the answer to the question: why should I buy tesla and not gforce is “maybe you shouldn’t !!!”

The explanation is that the gtx280 wasn’t really 993 Gflop/s single precision, it was 662 Gflop/s single precision with the ability to dual issue under a very specific and limited set of conditions to yield a very theoretical 993 GFlop/s. Those conditions for dual issue rarely, if ever, happened in real code. So there is actually a lot of progress, but it doesn’t look like it only because they were a bit “optimistic” in how they described the performance of the previous generation card. In all my testing so far, Fermi is a very big improvement over the GT200.

The explanation is that the gtx280 wasn’t really 993 Gflop/s single precision, it was 662 Gflop/s single precision with the ability to dual issue under a very specific and limited set of conditions to yield a very theoretical 993 GFlop/s. Those conditions for dual issue rarely, if ever, happened in real code. So there is actually a lot of progress, but it doesn’t look like it only because they were a bit “optimistic” in how they described the performance of the previous generation card. In all my testing so far, Fermi is a very big improvement over the GT200.

Those test results would indeed be very nice. In the past there was some mention of MTBF numbers to become available, but I have never seen them.

As for your question ‘where’s the progress’. You are comparing marketing FLOPS from GT200 with real FLOPS from GF100. 622G → 1.03T is more accurate for real-life software.

Oh and the question why should I buy Quadro or Tesla is simple: support. I was wednesday on a symposium about GPU’s, and while companies that were buying geforce to put into their products were complaining about support, companies that put Quadro or Tesla in their products were very happy with the support they receive from NVIDIA.

Those test results would indeed be very nice. In the past there was some mention of MTBF numbers to become available, but I have never seen them.

As for your question ‘where’s the progress’. You are comparing marketing FLOPS from GT200 with real FLOPS from GF100. 622G → 1.03T is more accurate for real-life software.

Oh and the question why should I buy Quadro or Tesla is simple: support. I was wednesday on a symposium about GPU’s, and while companies that were buying geforce to put into their products were complaining about support, companies that put Quadro or Tesla in their products were very happy with the support they receive from NVIDIA.

I dont completely agree with your last statement. Its true that theres no real support as companies would wish for consumer cards. But NVIDIA could offer it, too. Maybe for a monthly fee or sth like that. I even think you could find many new customers this way. Perhaps the whole story of only supporting Teslas is about selling more of these and creating a new branch of GPUs that wouldnt be necessary otherwise. Im not speaking bout the advantages of Teslas like increased gmem. But some companies just dont need 4 GB of gmem and DP and would be very happy with a regular consumer card if the support would just be a little bit better.

I dont completely agree with your last statement. Its true that theres no real support as companies would wish for consumer cards. But NVIDIA could offer it, too. Maybe for a monthly fee or sth like that. I even think you could find many new customers this way. Perhaps the whole story of only supporting Teslas is about selling more of these and creating a new branch of GPUs that wouldnt be necessary otherwise. Im not speaking bout the advantages of Teslas like increased gmem. But some companies just dont need 4 GB of gmem and DP and would be very happy with a regular consumer card if the support would just be a little bit better.

PNY sells Quadro and Tesla cards as far as I understand (all reference design, standard clocks).

Consumer cards are sold by numerous companies. So the whole supporting consumer cards idea is a bit difficult with lots of different cards, some overclocked, some non-reference design.

PNY sells Quadro and Tesla cards as far as I understand (all reference design, standard clocks).

Consumer cards are sold by numerous companies. So the whole supporting consumer cards idea is a bit difficult with lots of different cards, some overclocked, some non-reference design.

I understand that but still they could offer the same support for consumer cards of one manufacturer at least. No reason not to offer the same support than for Fermis.

I understand that but still they could offer the same support for consumer cards of one manufacturer at least. No reason not to offer the same support than for Fermis.

I am not sure how the other manufacturers would think of that…

I am not sure how the other manufacturers would think of that…

I have joined community few days ago. But I was following forums for quite long time.

I got impression that nVidia representatives on this forum deliberately ignore the questions.

It makes me sick not being able to run CUDA apps via remote desktop on plain GTX 4xx cards.

I don’t care for DP performance at the moment. My company is making some proof of concept development and we do not want to invest thousands of euros in Tesla at the moment. However we really miss possibility to test applications via RDP.

It is complete unbeliveable that such driver feature can not be enabled for GTX cards.

What is the reason for disabling RDP driver feature on GTX cards ?

Can I get precise answer please ?

thanks

Mirko