The price jumped from a few dollars to above 30 dollars per coin and has been holding up great so far. Now a lot of people invest in new mining hardware (mostly AMD GPUs). There is no competition from FPGA and ASIC chips yet (and from an engineering standpoint I would say: this will last). Designing ASICs with big fat memory pipes is kind of hard.
Bitcoin mining on GPUs hasn’t been profitable for a long time. FPGAs and ASICs killed that idea.
cudaminer gets 150kHash/s on a Compute 3.0 device (amazon EC2 instance) while his miner gets 220 kHash/s. EDIT: so far he has disclosed only that it’s targeting Kepler, using the warp shuffle instruction.
I am currently peaking at 480kHash/s on a GTX 780Ti (with some overclocking), and I have also been able to reduce the CPU load to near zero. With David’s changes these critters could exceed 600 kHash/s, becoming serious rivals to AMD cards!
My dedicated mining rig with 3x 780Ti cards build is currently not functioning because I do not yet have an adequate power supply (the affordable ones all seem to be on back order). I currently run two out of three cards in separate PCs.
David Andersen’s work has boosted the hash rate of cudaminer to 550 kHash/s on my slightly OC’ed 780Ti. I’ve seen some extreme overclockers report 650 kHash/s. That’s way in AMD’s territory, there.
AMD cards are still cheaper to acquire, though. So they will keep their edge over nVidia for litecoin (scrypt based) mining.
The biggest speed gains were for Compute 3.0, devices - 50% gain in some cases. Ah, the power of the SHFL instruction.
briefly using the CUDA visual profiler from the CUDA 5.0 SDK. But I got confused by all the new counters that were introduced since I last tried this on Compute 1.x and Compute 2.x devices. I’ve since upgraded to CUDA 5.5 - and maybe I should try this again. I heard the profiler was greatly enhanced since.
One worthwhile thing to try would be to replace shfl with shared memory to bring Dave Andersen’s principal design to Fermi and older devices. Maybe there’s still some speed-up to be discovered…
In case someone is interested. You can do serious coin mining with nVidia:
I present my 1.65 MHash/s miner using 3 nVidia GTX 780Ti cards - 850 Watts power draw from the wall. Running Kubuntu 12.04. This build is a bit noisy with two out of 3 GPUs running at 90% fan speed (airflow needs improvement)
Mainboard: Asrock Z87 Fatal1ty Killer (a gaming mainboard with 3 PCI express x16 slots), CPU: Intel Core i3-4130T, LGA1150, PSU: Aerocool GT-1050S CM 1050W ATX
The mainboard wasn’t cheap, but it may later become the basis for my next desktop PC build.
Two more x1 PCI-x slots are available, for which I could use powered risers to add more hashing power.
If I were to build another mining rig, I would probably use GTX 780 cards instead, and run windows to overclock them.
Gives same or better performance at less cost.
I wish there was an overclocking option for Fermi and Kepler cards on Linux.
Currently I have to use a modded video BIOS (for some extra 40kHash/s per card).
“A new profiling feature in CUDA 5.5 allows you to profile the clocks, power, and thermal characteristics of the GPU as it executes your code. This feature is available in the NVIDIA Visual Profiler on Linux and 64-bit Windows 7/8 and NSight Eclipse Edition on Linux. Learn how to activate and use this feature by watching CUDACasts Episode 13.”
By the way, the latest cudaminer github version also does scrypt-jane hashing (for Yacoin, QQCoin) and it beats the current mining software for AMD GPUs by quite a margin.
By the way, I’ve received an unsolicited code submission from nVidia that boost’s Compute 3.0 devices by ~13% and Compute 3.5 devices by 20%.
I am now mining 1.88 MHash/s on 3 GTX 780Ti cards, each one doing about 625-630 kHash/s. Sweet. With extra overclock they could do more, but Linux limits my overclocking options…
The code is available in the cudaminer github repo for anyone interested in scrypt or scrypt-jane cryptocoin mining. The respective optimized kernels have been named “Y” and “Z”… until I find a better naming system ;)
The nVidia engineer took my test_kernel code (which used __shfl() based transposition) and made it work much better. Seems I was on the right path when trying the shfl instruction, but I had stopped short of producing something useful.
AMD cards are no longer significantly faster, just significantly cheaper. So could nVidia lower the price please? That’d be great… ;)
The truth is I posted before reading all posts. right now I have the Cuda-master version from git and when I use -l Z flag on ubuntu I get segmentation fault. I will read the previous posts in a few days and check what I did wrong. I think I have the wrong cudaminer version.
Edit. I just realized I the enw version was submitted just a a day ago. So I will downsload the new version and try the new kernels. I have a Titan and access to 660 Ti cards for testing.
./cudaminer -d 0 --benchmark -l Z14x24 -i 0 -H 1
*** CudaMiner for nVidia GPUs by Christian Buchner ***
This is version 2014-01-20 (beta)
based on pooler-cpuminer 2.3.2 (c) 2010 Jeff Garzik, 2012 pooler
Cuda additions Copyright 2013,2014 Christian Buchner
My donation address: LKS1WDKGED647msBQfLBHV3Ls8sveGncnm