AMD and OpenCL

eyalhir74 · August 9, 2009, 2:27pm

Hi,
If I wasnt too paranoid, I’d think AMD deserted their GPU
“AMD does reverse GPGPU, announces OpenCL SDK for x86”

[url=“AMD does reverse GPGPU, announces OpenCL SDK for x86 | Ars Technica”]2009 | Ars Technica

eyal

parallelis · August 10, 2009, 5:33pm

Yes, it has no-sense at this point, because the main problem will be to target GPU, and it’s a matter of algorithms more than a matter of code optimization, so developping on a CPU without any access to real GPU is pointless.

You may develop on a CPU (for CUDA), but you should test, compare and challenge your code on GPU at some point.

I hope AMD will have a good OpenCL GPU implementation as fast as possible, we all need nVidia to be challenged :-)

seibert · August 11, 2009, 4:34pm

Actually, having a CPU code-path for OpenCL is very smart. The most annoying thing about using CUDA in my current work is having to provide separate CPU code paths for systems which do not have a GPU. I am not very experienced in writing multicore SSE code, but am quite happy writing CUDA code. I have long wished for a compiler option to nvcc which could convert my CUDA code to an optimized CPU implementation. (There was a flurry of work on this a year ago, including some academic papers and rumors of nvcc being updated to do this, then silence. I assume that means it turned out to be harder than it sounds.)

Once the CPU-only implementations of OpenCL get good, there will be even greater incentive for people to learn the language, because it will benefit all of their users, rather than just those with supported GPUs. And you will only have to implement your algorithm once, not twice. (Or three times. AMD’s Bulldozer CPUs should be a very interesting fusion of CPU and GPU design.)

eyalhir74 · August 12, 2009, 12:14pm

Actually, having a CPU code-path for OpenCL is very smart. The most annoying thing about using CUDA in my current work is having to provide separate CPU code paths for systems which do not have a GPU. I am not very experienced in writing multicore SSE code, but am quite happy writing CUDA code. I have long wished for a compiler option to nvcc which could convert my CUDA code to an optimized CPU implementation. (There was a flurry of work on this a year ago, including some academic papers and rumors of nvcc being updated to do this, then silence. I assume that means it turned out to be harder than it sounds.)

Once the CPU-only implementations of OpenCL get good, there will be even greater incentive for people to learn the language, because it will benefit all of their users, rather than just those with supported GPUs. And you will only have to implement your algorithm once, not twice. (Or three times. AMD’s Bulldozer CPUs should be a very interesting fusion of CPU and GPU design.)

The fact that they shipped a CPU version first is what I find weird… like their GPU version (OpenCL or not) is not good enough.

In anycase I think this dual thing will become un-needed. You currently need both CPU and GPU versions because GPU is not mainstream.

Once its mainstream and your GPU code runs x50 times faster, why would you want CPUs? it will only make your code slower…

The scenario is here today… if you have X tasks that you run on the GPU, and it takes time, why won’t you offload, say, 5-20% of the work

to your 2 quad core CPUs? you’ll have to have a really good reason to do so, it might only set you back, performance wise…

my 1 cent :)

eyal

gshi · August 12, 2009, 3:38pm

If you need double precision in your kernel, then using both CPU cores and GPU makes a lot of sense.

-gshi

parallelis · August 18, 2009, 6:12pm

GPU path and CPU path are totally different.

Good CPU-optimized code (with intensive SSE use) may be 4X faster than “basic” C cpu-code, so if you have a quadcore, using “basic” c code you will end-up with same level of performance of a mono-threaded application: it has absolutely no-sense for me, especially considering compute-intensive application.

So you CPU path will be totally CPU-optimized, and may even be optimized for intel SSE instead AMD SSE for maximum performance (core microarchitecture optimizations and trying to obtain 2 SSE operations per cycle.core).

Good GPU-optimized code won’t be the same as there’s no SSE (naturally), no cache, different memory access path, and they usually don’t even use the same algorithms.

So producing today CPU-oriented code, “basic” C code or SSE optimized by hand, won’t give you any insight to GPU-oriented code,

and for my 2 cents, I wont go to produce OpenCL-code just because I am not able to understand or use libpthread :-)

Axure · August 19, 2009, 1:06pm

…Of course, unless AMD’s OpenCL SDK actually has some optimizations for specific platforms.

By the way, is this OpenCL release really just for CPUs? I thought they’re supporting both CPUs and GPUs.

parallelis · August 21, 2009, 3:29pm

You could optimize code-generation for an architecture, says Intel core2, AMD Phenom or nVidia GT200 GPU, and there’s no doubt that nVidia, Intel and AMD/ATI will do their best to have the best performance-level on any algorithm that we may throw at their CPU or GPU.

But when you develop for nVidia’s GPU, you don’t use the same algorithm than the CPU-optimized code, because CPU-optimized algorithm may be 10X slower on GPU!!!

If you throw CPU-optimized algorithm, whatever your best efforts, optimizing by hand, you will end-up running slower than generic C CUDA-code with a GPU-optimized algorithm.

You may even use different algorithm for nVidia’s GPU and ATI’s GPU, because of their architectural differences!

For me, that’s a problem on OpenCL: C source-code may be generic, but algorithms must be fine-tuned for each platform (CPU, ATI GPU or nVidia GPU), with a hint of SSE inline code (or MACROS) for CPUs. There’s nothing generic to exploit the whole potential of a modern computing platform!

Topic		Replies	Views
Opinions on OpenCL on nVidia/AMD GPUs Is it worth supporting both vendors so I can always use the be CUDA Programming and Performance	14	9469	March 27, 2012
Nvidia SDK OpenCL CPU support NVIDIA GPU Computing SDK support for compile OpenCL kernels for CPUs CUDA Programming and Performance	15	15379	November 23, 2010
CUDA with AMD ATI Radeon 5870 CUDA Programming and Performance	5	3023	November 3, 2009
CUDA / OpenCL 1.1 Comparison CUDA Programming and Performance	3	1492	December 1, 2010
OpenCL or CUDA? CUDA Programming and Performance	16	10979	October 26, 2011
OpenCL on CPU CUDA Programming and Performance	3	2902	December 30, 2010
oclDeviceQuery not returning CPU CUDA Programming and Performance	3	2173	May 20, 2011
NV OpenCL on CPU CUDA Programming and Performance	10	7413	February 4, 2010
OpenCL on CPU? CUDA Programming and Performance	6	5412	January 22, 2010
OpenCL CUDA Programming and Performance	3	2559	January 19, 2009

AMD and OpenCL

Related topics