Could anyone benchmark this for me on a 780 (Ti) or Titan?

cbuchner1 · November 20, 2013, 10:53am

My guess is that a stock 780 Ti could do 400-450 kHash/s simply by extrapolating the performance of my GT 640 (GK 208 chip), but due to lack of testers I haven’t been able to confirm this yet

Find this program (Windows binary + source code) here:.

Linux sourcecode is best taken from github directly: GitHub - cbuchner1/CudaMiner: a CUDA accelerated litecoin mining application based on pooler's CPU miner

I do have kernels for all major CUDA architectures inside. It gets harder and harder to optimize them further. nVidia cards are still lagging behind AMD cards in terms of performance, but I’ve been able to close the gap somewhat ;) The high end AMD cards push numbers in the 800 kHash/s range.

cbuchner1 · November 20, 2013, 10:54am

The required command line options to launch this for benchmarking

cudaminer --benchmark

It will first auto-tune to find a suitable kernel launch configuration (this may take some time), and then it reports some kHash/s values in the console.

Jimmy_Pettersson · November 20, 2013, 11:20am

Kudos for the easy to run benchmark!

It’s still running on my GTX Titan, it reported 393 kHash/s .

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\hpc\Downloads\cudaminer-2013-11-20\cudaminer-2013-11-20>cudaminer.exe --benchmark
           *** CudaMiner for nVidia GPUs by Christian Buchner ***
                     This is version 2013-11-20 (alpha)
        based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler
               Cuda additions Copyright 2013 Christian Buchner
           My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm

[2013-11-20 12:03:30] 1 miner threads started, using 'scrypt' algorithm.
[2013-11-20 12:03:30] Binding thread 0 to cpu 0
[2013-11-20 12:03:49] GPU #0: GeForce GTX TITAN with compute capability 3.5
[2013-11-20 12:03:49] GPU #0: interactive: 0, tex-cache: 0 , single-alloc: 0
[2013-11-20 12:03:50] GPU #0: Performing auto-tuning (Patience...)
[2013-11-20 12:07:02] GPU #0:  393.34 khash/s with configuration T28x18
[2013-11-20 12:07:02] GPU #0: using launch configuration T28x18
[2013-11-20 12:07:02] GPU #0: GeForce GTX TITAN, 16128 hashes, 0.08 khash/s
[2013-11-20 12:07:02] Total: 0.08 khash/s
[2013-11-20 12:07:02] GPU #0: GeForce GTX TITAN, 16128 hashes, 161.27 khash/s
[2013-11-20 12:07:02] Total: 161.27 khash/s
[2013-11-20 12:07:04] GPU #0: GeForce GTX TITAN, 806400 hashes, 310.85 khash/s
[2013-11-20 12:07:04] Total: 310.85 khash/s
[2013-11-20 12:07:09] GPU #0: GeForce GTX TITAN, 1564416 hashes, 313.93 khash/s
[2013-11-20 12:07:09] Total: 313.93 khash/s
[2013-11-20 12:07:14] GPU #0: GeForce GTX TITAN, 1580544 hashes, 313.27 khash/s
[2013-11-20 12:07:14] Total: 313.27 khash/s
[2013-11-20 12:07:19] GPU #0: GeForce GTX TITAN, 1580544 hashes, 314.14 khash/s
[2013-11-20 12:07:19] Total: 314.14 khash/s
[2013-11-20 12:07:25] GPU #0: GeForce GTX TITAN, 1580544 hashes, 312.84 khash/s
[2013-11-20 12:07:25] Total: 312.84 khash/s
[2013-11-20 12:07:30] GPU #0: GeForce GTX TITAN, 1564416 hashes, 314.25 khash/s
[2013-11-20 12:07:30] Total: 314.25 khash/s
[2013-11-20 12:07:35] GPU #0: GeForce GTX TITAN, 1580544 hashes, 309.89 khash/s
[2013-11-20 12:07:35] Total: 309.89 khash/s
[2013-11-20 12:07:40] GPU #0: GeForce GTX TITAN, 1564416 hashes, 314.44 khash/s
[2013-11-20 12:07:40] Total: 314.44 khash/s
[2013-11-20 12:07:45] GPU #0: GeForce GTX TITAN, 1580544 hashes, 313.89 khash/s
[2013-11-20 12:07:45] Total: 313.89 khash/s
[2013-11-20 12:07:50] GPU #0: GeForce GTX TITAN, 1580544 hashes, 312.59 khash/s
[2013-11-20 12:07:50] Total: 312.59 khash/s
[2013-11-20 12:07:55] GPU #0: GeForce GTX TITAN, 1564416 hashes, 313.93 khash/s
[2013-11-20 12:07:55] Total: 313.93 khash/s
[2013-11-20 12:08:00] GPU #0: GeForce GTX TITAN, 1580544 hashes, 311.36 khash/s
[2013-11-20 12:08:00] Total: 311.36 khash/s
[2013-11-20 12:08:05] GPU #0: GeForce GTX TITAN, 1564416 hashes, 313.24 khash/s
[2013-11-20 12:08:05] Total: 313.24 khash/s
[2013-11-20 12:08:10] GPU #0: GeForce GTX TITAN, 1580544 hashes, 314.08 khash/s
[2013-11-20 12:08:10] Total: 314.08 khash/s
[2013-11-20 12:08:15] GPU #0: GeForce GTX TITAN, 1580544 hashes, 313.09 khash/s
[2013-11-20 12:08:15] Total: 313.09 khash/s
[2013-11-20 12:08:20] GPU #0: GeForce GTX TITAN, 1580544 hashes, 313.96 khash/s
[2013-11-20 12:08:20] Total: 313.96 khash/s
[2013-11-20 12:08:25] GPU #0: GeForce GTX TITAN, 1580544 hashes, 314.21 khash/s
[2013-11-20 12:08:25] Total: 314.21 khash/s
[2013-11-20 12:08:30] GPU #0: GeForce GTX TITAN, 1580544 hashes, 309.65 khash/s
[2013-11-20 12:08:30] Total: 309.65 khash/s
[2013-11-20 12:08:35] GPU #0: GeForce GTX TITAN, 1548288 hashes, 312.64 khash/s
[2013-11-20 12:08:35] Total: 312.64 khash/s
[2013-11-20 12:08:40] GPU #0: GeForce GTX TITAN, 1564416 hashes, 313.49 khash/s
[2013-11-20 12:08:40] Total: 313.49 khash/s
[2013-11-20 12:08:45] GPU #0: GeForce GTX TITAN, 1580544 hashes, 314.52 khash/s
[2013-11-20 12:08:45] Total: 314.52 khash/s
[2013-11-20 12:08:50] GPU #0: GeForce GTX TITAN, 1580544 hashes, 313.33 khash/s
[2013-11-20 12:08:50] Total: 313.33 khash/s
[2013-11-20 12:08:55] GPU #0: GeForce GTX TITAN, 1580544 hashes, 312.65 khash/s
[2013-11-20 12:08:55] Total: 312.65 khash/s
[2013-11-20 12:09:00] GPU #0: GeForce GTX TITAN, 1564416 hashes, 313.56 khash/s
[2013-11-20 12:09:00] Total: 313.56 khash/s
[2013-11-20 12:09:05] GPU #0: GeForce GTX TITAN, 1580544 hashes, 314.14 khash/s
[2013-11-20 12:09:05] Total: 314.14 khash/s
[2013-11-20 12:09:10] GPU #0: GeForce GTX TITAN, 1580544 hashes, 313.77 khash/s
[2013-11-20 12:09:10] Total: 313.77 khash/s
[2013-11-20 12:09:15] GPU #0: GeForce GTX TITAN, 1580544 hashes, 314.08 khash/s
[2013-11-20 12:09:15] Total: 314.08 khash/s
[2013-11-20 12:09:20] GPU #0: GeForce GTX TITAN, 1580544 hashes, 313.71 khash/s
[2013-11-20 12:09:20] Total: 313.71 khash/s
[2013-11-20 12:09:25] GPU #0: GeForce GTX TITAN, 1580544 hashes, 309.23 khash/s

Jimmy_Pettersson · November 20, 2013, 11:22am

Btw, I also have a K20 but I doubt it would improve performance further.

I don’t know much about mining but this seems like pretty good performance for NV GPUs :-)

cbuchner1 · November 20, 2013, 12:31pm

The results reported by the autotuning procedure and the actual results achieved during mining show quite a substantial difference. So in the end the program only achieves 313 kHash/s.

I have yet to figure out why autotune’s results are too optimistic.

Thanks for running the tests!

Jimmy_Pettersson · November 20, 2013, 1:24pm

No problem, will be happy to run more tests if you do further code updates. Make sure to PM me as I don’t check the forums as often as I used to. :-)

blade613x · November 20, 2013, 3:44pm

I found Titan to work best using “-l K14x24 -C 1” in the command line. But I haven’t played around with it too much. And also, DP enabled makes it run hotter and slower.

At stock 1006/6000 (boost clocks) I get ~405 kH/s (360 kH/s DP enablerd)
and at 1150/6000 (overclocked), I get ~455 kH/s (420 kH/s DP enabled)

Unfortunately, the power draw really ruins any chance I have of running Titan was a miner. But, cudaminer is a great start to making GPU mining more competitive for the green team.

cbuchner1 · November 20, 2013, 3:52pm

those doing benchmarks, maybe using the flag -H 1 will remove a CPU limitation. The CPU is doing SHA256 hashes before and after CUDA runs the scrypt core kernels. And if -H 1 is not given, the CPU part runs only on a single core - potentially being a limiting factor for the kHash/s values reported after autotuning. -H 1 enables the use of parallel_for constructs to distribute the workload across all cores.

the benchmarking should therefore be called with

cudaminer -H 1 --benchmark

Also you can enable one of two caching options using -C 1 and -C 2 when trying the Kepler kernel. the Titan kernel automatically caches its global memory access by means of the __ldg() intrinsic and it ignores the -C option

It is surprising that the Kepler kernel may be fastest on a Titan card, as on my GK208 based card (Compute capability 3.5) the Titan kernel wins with a notable performance edge - it makes direct use of the funnel shifter.

For real mining work for Litecoin or other crypto coins you can also pass the -i 0 flag which utilizes the card nearly 100%. The default is to leave a millisecond of sleep time between kernels, to allow for some display interactivity (assuming the GPU also drives a monitor).

blade613x · November 20, 2013, 7:21pm

That’s what I was doing. I was passing the -i 0 flag and running it on my second Titan. I forgot to mention that so my numbers are with the -i 0 flag.

Jimmy_Pettersson · November 21, 2013, 9:45am

With

cudaminer -H 1 --benchmark

I saw some improvement on my machine, ~350 kHash/s .

cbuchner1 · November 21, 2013, 3:41pm

Thanks for the update Jimmy .

I ordered a GTX 780 Ti. I feel confident that I can get this beast to output 400kHash/s minimum. While it’s still not a good investment for cryptocoin mining (ATI cards rule), it certainly is the sexiest CUDA card out there, if you don’t care for double precision arithmetics or 6GB of memory. And I do a lot of other programming in CUDA and OpenGL, and some gaming as well.

BTW: one litecoin = 7 EUR currently. Sweet. Graphics cards paying for themselves - that is nice.

Jimmy_Pettersson · November 22, 2013, 8:58am

i++;

I rarely work in double precision. I guess the only thing I would miss would be the 6 GB of memory but let’s not get too spoiled. :-)

Btw, I was looking at some die shots of the SMX on GK104 cores and GK110, there really seems to be a significant size difference. I wonder if we’ll continue to see this hardware divergence into compute and gaming.

seibert · November 22, 2013, 1:59pm

Probably, as long as GPUs push the die size envelope. A smaller die lowers the defect probability for any given die, reducing losses. Die harvesting by disabling defective SMXs helps some, but I don’t think you can beat having the smallest die possible. As long as compute applications are willing to pay a price premium, they will get the big chips. :)

On the other end of the compute scale, where die area is so small as to not matter much, I find it interesting that thermal cap is the new product differentiator. The iPhone 5s, iPad mini, and iPad all use the same CPU/GPU chip now, and differ in the configuration of their maximum thermal output. Their max clock rates are nearly identical (1.3 GHz for iPhone, 1.4 GHz for both iPads), but the dynamic clock rates vary to stay within the power limitations of the particular device.

With GPU Boost now in Tesla, I think we are solidly in the “constant power, variable clock” era. I fully expect to see the Visual Profiler start rating the energy efficiency of our programs within 3-5 years. :)

cbuchner1 · November 22, 2013, 2:44pm

The constant power, variable clock also played a role in my optimization of the Kepler compute kernel.

I was able to optimize out some redundant arithmetic operations (array indexing computations) that were situated inbetween memory load/stores. These did not really slow down the execution because the load/stores had large latencies. However these redundant instructions consumed additional power. With these optimized out the GPU had more power to spare before reaching the power limitation and hence clocked up more on average. So I did get my speed gain after all.

This optimization benefit was not seen in Fermi and Legacy kernels because these GPUs do not have the same dynamic clocking based on power cap.

seibert · November 23, 2013, 1:28am

Hah, that’s awesome! Now I really want the tools to help me maximize the average clock rate on Kepler…

cbuchner1 · November 24, 2013, 6:38am

My GTX 780 Ti reaches 430 kHash/s with cudaminer ;)

Maybe more if I add some overclocking.

Jimmy_Pettersson · November 24, 2013, 12:02pm

That’s great! Now we can add yet another variable to our N-dimensional optimization space :-)

I expect future Nsight and the visual profiler version to profile power consumption during kernel execution and also give me hints about which instructions consume more/less power ;-)

Jimmy_Pettersson · November 24, 2013, 12:09pm

WOW!

cbuchner1 · December 3, 2013, 9:48am

I tried using the Kepler SHFL instruction to replace shared memory, but so far I have not been able to get the hash rates any higher.

But with some overclocking I can now get the 780Ti to output 480 kHash/s.

Which is why I am now building a proof-of-concept Megahash mining rig using all CUDA cards.

2 x GTX 780Ti
1 x GT 640 with DDR5 RAM

the thing should at least pay for itself, and in the long run maybe even make a bit of profit.
So essentially I will get two high end cards for free - which are still useful even when the mining craze has subsided.

I expect it to draw 550-600 Watts from the wall, producing 1030 kHash/s. Nearly 2 kHash/s per Watt.

And they said nVidia cards sucked for Litecoin mining.

Mining currently has a HUGE impact on the GPU market. AMD cards selling out everywhere. nVidia to the rescue. ;)

Jimmy_Pettersson · December 3, 2013, 7:25pm

cbuchner1:

I tried using the Kepler SHFL instruction to replace shared memory, but so far I have not been able to get the hash rates any higher.

But with some overclocking I can now get the 780Ti to output 480 kHash/s.

Which is why I am now building a proof-of-concept Megahash mining rig using all CUDA cards.

2 x GTX 780Ti
1 x GT 640 with DDR5 RAM

the thing should at least pay for itself, and in the long run maybe even make a bit of profit.
So essentially I will get two high end cards for free - which are still useful even when the mining craze has subsided.

I expect it to draw 550-600 Watts from the wall, producing 1030 kHash/s. Nearly 2 kHash/s per Watt.

And they said nVidia cards sucked for Litecoin mining.

Mining currently has a HUGE impact on the GPU market. AMD cards selling out everywhere. nVidia to the rescue. ;)

Great job Christian!

It seems the best AMD Litecoin miners are between 1.9-3.16 Kh/Watt so your rig sounds like it will be really competitive!

Is it correct that Litecoin is still profitable on GPUs while bitcoins isnt any longer?

Topic		Replies	Views
AMD Radeon 3x faster on bitcoin mining SHA-256 hashing performance CUDA Programming and Performance	70	47433	July 2, 2013
Bitcoin miner Jetson TX2	14	39465	January 31, 2018
Cuda 7.5 give a 30% performance loss vs cuda 6.5 CUDA Programming and Performance	33	13533	May 11, 2016
You should assist in the cudaminer development CUDA Programming and Performance	9	2682	February 5, 2014
An Interesting Development - New CMP architecture CUDA Programming and Performance	31	2787	April 20, 2021
Is nvidia forcing SP compute customers into expensive cards? Why is SP Cuda so slow on gtx680? Somet CUDA Programming and Performance	49	13334	May 20, 2012
GTX 1080 very bad result for mining CUDA Programming and Performance	24	36843	October 2, 2017
So what's new about Maxwell? CUDA Programming and Performance	166	56246	March 10, 2015
Huge performance difference depending on the machine I put my card in CUDA Programming and Performance	17	7543	September 5, 2015
One weird trick to get a Maxwell v2 GPU to reach its max memory clock ! CUDA Programming and Performance	59	18094	April 22, 2016

Could anyone benchmark this for me on a 780 (Ti) or Titan?

Related topics