Userbase of dedicated GPUs How many?

I presume that most desktop/laptop PCs sold these days have at least some form of limited GPU capability.

But I’d love to know the proportion of users that have a dedicated GPU, and how many of those are CUDA enabled.

I’m thinking of postponing or abandoning development for a CPU version of my software, and put all my efforts into a GPU version, but obviously I don’t want to limit my own userbase.

Any comments on that? Some graphs would be great. Perhaps within 10 years, almost everyone will have a GPU capable of running GPGPU applications?

I presume that most desktop/laptop PCs sold these days have at least some form of limited GPU capability.

But I’d love to know the proportion of users that have a dedicated GPU, and how many of those are CUDA enabled.

I’m thinking of postponing or abandoning development for a CPU version of my software, and put all my efforts into a GPU version, but obviously I don’t want to limit my own userbase.

Any comments on that? Some graphs would be great. Perhaps within 10 years, almost everyone will have a GPU capable of running GPGPU applications?

About 3% of gamers have multi-GPU (which includes SLI and/or dedicated spare for PhysX). 95% of multiGPU users are NVidia, 5% AMD.

You can get exactly that kind of chart and analysis by looking at the Steam Survey. It is obviously biased towards the game-playing demographic, but the remainder of PC users (still a huge majority) likely just read email and webbrowse with cheesy Dell econoboxes, so the Steam Survey is probably what you should focus on.

About 3% of gamers have multi-GPU (which includes SLI and/or dedicated spare for PhysX). 95% of multiGPU users are NVidia, 5% AMD.

You can get exactly that kind of chart and analysis by looking at the Steam Survey. It is obviously biased towards the game-playing demographic, but the remainder of PC users (still a huge majority) likely just read email and webbrowse with cheesy Dell econoboxes, so the Steam Survey is probably what you should focus on.

Today, your userbase depends on which technology you target. I think nearly all laptop and desktop computers less than 5 years old have a GPU made by one of three companies: Intel, NVIDIA, or ATI/AMD. If you go with CUDA, then you are limited to NVIDIA GeForce 8 and higher GPUs, which have been on-sale since around start of 2007. If you go with OpenCL, then you get NVIDIA GeForce 8 and later, plus the ATI Radeon HD 4xxx and later. I don’t believe there is any API that provides hardware acceleration on Intel GPUs (they tend to be much less capable than NVIDIA or ATI).

In 2011, we’ll see updated CPUs from both Intel and AMD that incorporate more GPU-like wide SIMD units. Most likely, these CPU capabilities will eventually be used by OpenCL drivers on both platforms. So, long term, OpenCL and its future revisions are a good bet, even if I think that CUDA is more productive today.

Today, your userbase depends on which technology you target. I think nearly all laptop and desktop computers less than 5 years old have a GPU made by one of three companies: Intel, NVIDIA, or ATI/AMD. If you go with CUDA, then you are limited to NVIDIA GeForce 8 and higher GPUs, which have been on-sale since around start of 2007. If you go with OpenCL, then you get NVIDIA GeForce 8 and later, plus the ATI Radeon HD 4xxx and later. I don’t believe there is any API that provides hardware acceleration on Intel GPUs (they tend to be much less capable than NVIDIA or ATI).

In 2011, we’ll see updated CPUs from both Intel and AMD that incorporate more GPU-like wide SIMD units. Most likely, these CPU capabilities will eventually be used by OpenCL drivers on both platforms. So, long term, OpenCL and its future revisions are a good bet, even if I think that CUDA is more productive today.

AMD is interested in OpenCL as a common standard. But Intel is not. They prefer Intel-designed solutions that they can control and monopolize (and license out just enough to avoid legal monopoly issues.) The never-released Larrabee could run OpenCL compiled code, but it was not going to… it was designed to use Larrabee new instructions and stay proprietary.

AMD chose OpenCL for GPU computing not because they liked it, but because their own proprietary languages (Stream and Brook+) were simply technically inferior and didn’t have the momentum of CUDA… when you’re far behind on the race it’s better to jump onto the bandwagon… better to submit to being a passenger than to be left behind entirely.

Intel doesn’t have that problem, they dominate the CPU market so if the Intel tools are technically as good as OpenCL, Intel will win by numbers.

AMD is interested in OpenCL as a common standard. But Intel is not. They prefer Intel-designed solutions that they can control and monopolize (and license out just enough to avoid legal monopoly issues.) The never-released Larrabee could run OpenCL compiled code, but it was not going to… it was designed to use Larrabee new instructions and stay proprietary.

AMD chose OpenCL for GPU computing not because they liked it, but because their own proprietary languages (Stream and Brook+) were simply technically inferior and didn’t have the momentum of CUDA… when you’re far behind on the race it’s better to jump onto the bandwagon… better to submit to being a passenger than to be left behind entirely.

Intel doesn’t have that problem, they dominate the CPU market so if the Intel tools are technically as good as OpenCL, Intel will win by numbers.

The best bet is to use “Ocelot” when CUDA GPU is not available. Ocelot is being optimized for CPU by Gregory and team (but I can’t guarantee that…Only Greg can.).

As I look at it, CUDA provides lot of scope for binary translation – which was well utilized by Ocelot.

If ocelot translates effectively for CPU backend and other backends (like cell, AMD etc.) – then we can retain CUDA as the programming model for future and keep a binary translator handy to migrate to different platforms… Hmmm… and, it is free…That sounds like a panacea for parallel woes…

The best bet is to use “Ocelot” when CUDA GPU is not available. Ocelot is being optimized for CPU by Gregory and team (but I can’t guarantee that…Only Greg can.).

As I look at it, CUDA provides lot of scope for binary translation – which was well utilized by Ocelot.

If ocelot translates effectively for CPU backend and other backends (like cell, AMD etc.) – then we can retain CUDA as the programming model for future and keep a binary translator handy to migrate to different platforms… Hmmm… and, it is free…That sounds like a panacea for parallel woes…

Even if OpenCL isn’t the ultimate winner, eventually we’re going to see all of these vector instructions wrapped behind some common data-parallel language or language extension. Auto-vectorizing for loops in C/C++ compilers is just not going to cut it. :)

Even if OpenCL isn’t the ultimate winner, eventually we’re going to see all of these vector instructions wrapped behind some common data-parallel language or language extension. Auto-vectorizing for loops in C/C++ compilers is just not going to cut it. :)

Oh my what a nasty response !!! However couldn’t have said it any better !!! :)

CUDA rocks and OpenCL is just a solution for vendors with crappy GPU software frameworks…

I simply can’t see why people would invest time in OpenCL… wow I’d be able to run my code

on both GPUs and CPUs in my system using one kernel (yeah right…). And now all that is left

is to wait for the GPU to finish slower (since its OpenCL and not CUDA) and the wait another X50 time

so that the CPU would finish (running its implementation of OpenCL).

Indeed very good idea…

my very early morning rant… :)

eyal

Oh my what a nasty response !!! However couldn’t have said it any better !!! :)

CUDA rocks and OpenCL is just a solution for vendors with crappy GPU software frameworks…

I simply can’t see why people would invest time in OpenCL… wow I’d be able to run my code

on both GPUs and CPUs in my system using one kernel (yeah right…). And now all that is left

is to wait for the GPU to finish slower (since its OpenCL and not CUDA) and the wait another X50 time

so that the CPU would finish (running its implementation of OpenCL).

Indeed very good idea…

my very early morning rant… :)

eyal