How fast does power-efficiency improve for GPU's? I wonder the rate of power efficiency developm

Hi there,

Is there a Moore’s law for GPU’s? Can we expect about %100 improvement in computing speed/power efficiency per year? For instance, could we project that given gtx 580 has about 1.5 teraflops per 250 wats (6 gflops per watt), could this be 6 teraflops per watt ten years later?, assuming there would be about 1024 fold improvement in 10 years. Is such exponential performance improvement expected for GPU’s? And are there any charts or papers about the history of GPU development that I can reference?



The graphical roadmap on page 16 of this presentation seems to suggest an improvement of about 85% per year. Although to me it looks like they downplayed Fermi’s DP GFLOP/s in order to make the scaling appear more impressive.

GPUs and CPUs are becoming the same thing. What applies to CPUs applies to GPUs as well.

Most of the perf/watt improvement currently relies on die shrinking. Of course, improved manufacture techniques also improve things somewhat.

I read some article before and found some theoretical analysis about how die-shrinking can lead to improved perf/watt. I can’t find that article now, though. Anyway, with a die-shrink factor of 0.7, and with a fixed die size, TDP will remain the same but perf/watt will double/triple. GPU dies get shrinked by 0.7 around every two years. An this trend could probably go until 8nm. But by the time we reach 8nm, we will probably get into carbon nanotubes, which certainly will present some different power profiles.

Here’s some info:

Thanks for the pointers, people, much appreciated.

The significance of 5 teraflops per watt is that it matches the power-efficiency of the human brain (according to my calculations based on theoretical neuroscience, which matches Hans Moravec’s estimate roughly). After that date, it would be possible to simulate human brains (assuming a very optimized simulation rather than a molecular simulation) faster than humans for the same power. Which is my preferred threshold for the “infinity point” (although of course there is no such thing as infinity that’s just a term) since we could essentially have accelerated uploads (relative to humans). Even assuming that AI research has failed, by then.

I can now project a date according to NVIDIA’s own data, which is cool. Yay for the supercomputing company, although their motivation is scientific computing, they help us AI researchers as well (^_^).

@tera: Are you referring to 5 gflops/watt? I calculated 6 for GTX 580 (which I use), the number would be different for current Tesla C2090’s double-precision performance I suppose.

@hyqneuron:Good point about GPUs and CPUs, I suppose they are bound by the same technological improvements. It would be nice if you could recall that article :)

You should check out this. If our semi-conductor technology continues to develop in the way it has for the past few decades, it’s never going to reach 5 TFLOPs/watt.

Except that even the best human brains can manage at most a few dozen flops, and there are no few (some of them installed in Congresscritters) that can’t even handle integer operations :-)

The 5 teraflops/watt (100 teraflops) figure seems either way too low or way too high to me, depending on how you define “flops”. On a cellular level, it’s way too low, since we’re talking about 10^15 synapses refreshing every fraction of a second. On a functional level, the CPU-heaviest task humans are capable of performing is image processing / image recognition, and we don’t even reach 1 gigaflop there. Most of the “stuff” we do on a daily basis does not even raise above the kiloflop mark. Most of the brain matter (those 10^15 synapses) is not so much a CPU as a hard drive with sophisticated data organization system. Whose total capacity has been estimated as low as 10 gigabytes. We get away with that little by storing most of the information in highly specialized, lossy form. Say, when you memorize a picture of Mona Lisa, you don’t do JPEG-like pixel-by-pixel compression (this might take hundreds of kb of valuable memory). You extract basic shapes, significant edges and approximate colors, and commit those to memory, possibly taking less than a kilobyte.

But we have to recognize that in doing tasks like image recognition, the brain is accomplishing the task by a method that (as far as we can tell, anyway), doesn’t actually involve floating point operations or anything of the sort. Which is a bit of a conundrum: in order to do brain simulations on computers, you have to use billions of flops to model the states of even a small clump (one cortical column, say) of neurons & their synapses, but you put all the neurons in the human brain to work doing math problems, and it can at best manage to do a few flops.

Despite the conventional wisdom, brains are not computers.

True. It could be more accurate to say that, for most tasks that the brain can do, we can figure out a way to do them on a computer in real time using 1 gigaflop or less.

The biggest stumbling block is to build a system that does the “figuring out” by itself. And there’s no apparent reason to think that we’re flop-limited in this aspect. It’s just that the design we have to come up with has to be very complex.

On the other hand, even we had petaflops necessary to do direct modelling on a cellular level, prospects of that modelling seem extremely underwhelming to me. You can’t configure the simulation to act like a trillion of neurons, launch it and expect it to behave like an intelligent human. For a human being, it takes several years of direct interaction with environment after the “initial boot-up” to create links and populate data structures necessary to demonstrate intelligence. Nor does it seem likely to me that we could “disassemble” an actual adult human brain with all data structures and upload it into the computer. (The best way to go would be to freeze an executed criminal’s brain in liquid nitrogen and then to scan it layer by layer, I suppose?)

I have wondered about that myself, but hey, they just pay me to write code. Especially since if you want to build human brains, the job can be done in 9 months, and if necessary contracted out to unskilled labor :-)

I really think any practical benefits would come from a much lower-level understanding of neural systems. Suppose for instance you could reproduce the target acquisition & flight control system of a dragonfly?

Interesting topic. Since it looks like your are quite well informed on that subject, could you tell me where your figures come from?

Do you have some blogs/website to recommend on that subject?

Thanks a lot!

Can we expect about %100 improvement in computing speed/power efficiency per year?

“speed/power efficiency” really depends on what kind of parallel tasks are considered. Modern GPU architecture ensures the best performance scalability (with number of transistors) for the data-parallel-problems and it is VERY INEFFICIENT for the task-parallel-problems which may be scaled up effectively only by effective MIMD machine like multi-core i7 or Optern (modern GPUs are far away from being an effective MIMD device). Btw, MIMD scales up effectively for data-parallel-problems as well. For example, even for some 3D rendering problems dual X5650 may dramatically outperform the GPU cluster (given the same price range). My dream GPU is a kilo-scalar-core machine with totally undependable scalar-cores; so far only Intel/AMD on this track. The irony is that NVIDIA has developed CUDA platform which is a perfect fit for MIMD architecture so, may be they have in mind to come up eventually with effective MIMD GPU after all (so far modern GPUs are good only for SIMD friendly algorithms).



Well, I’m a human-level AI researcher so I do have an active interest in these numbers. The place to start is Hans Moravec’s now classical paper:

I have myself calculated some bounds on computation speed of the human brain based on state-of-the-art neuroscience and I have more or less matched the same figure for an adult human brain. So it turns out that Moravec actually made a pretty good prediction! Note that an infant’s brain has much larger connectivity, which might easily increase this number to several petaflops.

With the GPGPU architecture, of course, there could be a memory problem. In even the simplest neural network model, you store a weight per synapse. God knows what would be required for a complete neuron model (right now I suspect that the models themselves are inadequate, the number I calculated assumes that the model is optimal and complete, which is not a very easy thing to achieve in practice!). The memory problem again! Will the memory wall block the singularity? That calls for another forecast.

Note that, the shared memory architecture actually has some advantages over fully distributed fine-grain memory of the human brain. Marvin Minsky has said on our mailing list that the shared memory architecture might actually enable far slower computers to emulate the brain.

With the 85% per year development in speed-power efficiency however, raw computing power efficiency will be matched in 2023. I’ve already written this year in a paper, so corrections are most welcome!

That’s an amazing threshold. What happens is that, if we manage to write the right brain sim program by then, it will be always cheaper to run an upload than to run a human to solve an intellectual problem (actually 5 times cheaper, but it is also cheaper than just the brain!). Which means that, you’ll fork several uploads to solve a problem or even give them autonomy and become a meta-person. Almost everyone will do it due to the enormous economic incentive. Will the supercomputer company bring us the singularity? I sure hope so! :))))

In my opinion, that could be an important threshold, because time compression effects are really felt after that, intelligence explosion will have truly begun. Or, it will wait until the computational neuroscientists get their whole brain emulation programs right!

By the way, I am not referring to a molecular level simulation, molecular packing, chemical synapses, etc. won’t be simulated exactly in this scenario, just abstract computation. The Blue Brain approach would be much more costly (and I don’t think it would even work it looks more like a fun supercomputing project than a serious whole brain emulation project)

Oh, yes, an MIMD GPU would be too sweet :)

How would you go about writing the right brain sim program, and, more specifically, initializing it with data? You can simulate an infant’s brain, but you can’t ask an infant to solve intellectual problems. Even if the infant had an adult-sized brain, it would be as useless for solving problems as a computer without BIOS and with a blank hard drive.

Can you make a rough outline of the way that brain sim would work? Do you expect to have one thread per neuron? What kind of operations need to be done, and how much data is exchanged between threads? With that kind of data, we can nail down the true performance of, say, GTX 580, as a brain simulator, identify bottlenecks and make predictions.


Take a look at Randal Koene’s work at, he is one of the researchers who is taking brain scanning seriously.

There are multiple promising approaches towards ultra-high resolution non-destructive brain scanning. I had found the quantum dot approach to be very promising, however the toxicity of quantum dots is not known yet IIRC. Nanotech to the rescue. Previously, it was thought that non-destructive scanning was physically impossible, since nobody thought we could get inside the brain.

We would likely scan an adult’s brain, although ultimately we would like to scan infants’ brains as well, why not? Or perhaps even develop the brain entirely in silico from the genome ultimately. That kind of research isn’t science fiction any more. If you search for “whole brain emulation” and “mind uploading” you will likely see many more relevant researchers, there is an oldish report written by Anders Sandberg and Nick Bostrom Brain Emulation Roadmap Report, which does refer to several scanning approaches and neuron models, however I’m going to say that the right model might not be there. It is a good review with many useful references, though. There are the computational requirements of different neuron models in that report, which may be useful for making predictions.

Randal Koene told us in AGI 10 conference that 1st order spiking neuron models are now accurate enough but the dynamics aren’t yet (models of plasticity may be insufficient), which is to say long-term memory is not in those models yet, so there is a lot to do still. We need people who are enthusiastic about computational neuroscience, to say the least.

It’s not my area of research; I do have some loose ideas based on recent experimental discoveries and theoretical neuroscience, though generally speaking I’m hopeful about the progress in the next decade.

My interest in this matter is the following: suppose that we AI researchers failed. (For instance our current programs WON’T work) Even then, bottom-up whole brain emulation approaches could yield a bioinformation based AI, which is an upload. Once these uploads are cheap and fast enough, there will be an intelligence explosion. Wouldn’t you have your brain scanned if it took only $1000 to do it? The cost of brain scanning technology, once it is available, will decrease rapidly. For one thing, many people would want a backup.

I found Randal Koene’s lecture on whole brain emulation on the AGI-10 website, this is extremely cool, fast forward to the future. Please watch it if you are interested:
Randal Koene, Whole Brain Emulation: Issues of scope and resolution, and the need for new methods of in-vivo recording

The quantum dot approach is interesting, though it seems to me that you need get them inside synapses and you need the right kind that will bind to information-carrying proteins, and no one does that yet. It does not look like the kind of technology we could have by 2023. Progress in biology is extremely slow.

Scanning and evolving an infant’s brain is an extremely bad idea. There are lots of things that can go wrong if the brain and the environment aren’t just right. Last thing we want is to create a psychotic sociopath AI with superhuman intelligence.

I’ll need to read through those other links.

Well they are pretty fixated on getting it right, they even intend to insert nanobots in the brain that will do the imaging or grow new issue that will. I think there was an issue of such farfetched methods in a brain machine interface journal, which I can’t find right now. We’ll see eventually :) I thought we could get some new kind of MRI to work for ultra-high res. imaging…

Of course it will be easier to get post-mortem procedures work well enough, but I’d really prefer to get scanned while I’m alive. :)