OpenCL - hmm... not so interesting What is your take on it?


I have been reading through the OpenCL spec. The spec is more ‘C’ oriented as it claims itself.

It would have been far far better if they had made their spec OOP oriented.
Anyway, they just want to abstract the heterogenity of compute devices by exposing them as homegeous compute components.
I feel OOP would have done a better job…

Also, the spec suddenly talks about “image_t”. This is totally deviating from its original purpose.
They must have taken it off to the appendix.

I really think the spec is clumsy and over all irritating. What do you guys think?

I think Microsoft is going to release a .NET based compute language and totally disregard this OpenCL.
Microsoft has promised GPU acceleration for Windows 7. It would be interesting to see what they do.
Do u have any ms links on this?

Best Regards,

Well, you have to get from machine code to OOP somehow. I predict a nice Apple-style layer of OOP niceties on top. This way you could take any different OOP route, be it ObjC or Java or .NET or C++ or whatever.

Not that I’ve read the spec yet … but yeah, you know what I mean.

Excellent topic though!

On the diagram there you can see that there’s a PTX layer beneath nVidia’s implementation of OpenCL. This has sparked a thought - could it be possible to write code basically in CUDA and have it compile to something like an “openCL PTX” that would be portable to ATI/Cell/Larabee etc.? I understand this would need to be limited to using only a subset of “real” CUDA (targeted for nV GPUs).

That or some other high-level wrapping, I’m not a big fan of low-level coding and I imagine a lot of non-CS scientists aren’t either.

I’m a CS scientist I guess - and even I don’t like to be slogging through a lot of low-level code if I don’t need to - I’d like the code to get out of the way of my brilliant CS ideas :)

I’m mostly bummed about the lack of templates in OpenCL. I’ve become extremely fond of them to easily create kernels tuned to specific tasks or profiles, especially in libraries. Mark Harris also uses them similarly a lot on his code (look at his SDK examples, or CUDPP).

OpenCL is back to the days of #define macros and cut-and-pasted code splatted everywhere.

?! No templates ?!

They better have good reasons …

Thanks for all your replies.

There is no C++ and hence no templates.

I learnt C++ few months back and then learnt C# and I can see how programming is evolving…

I had bcome a huge MS fan in a span of 3 months.

Now, OpenCL looks to be taking us a step backwards…

OpenCL claims to be the fastest evolved spec…Probably thats the problem. People were in a hurry to get something out that satiates every1. Microsoft did NOT participate in OpenCL. I guess they will do someting on their own. It would be good to see what they come up with.

Best Regards,


Whatever they come up with it will be not portable and it will also not have a benefit of OpenCL. The (only) benefit of OpenCL is that it will run on more devices, not just GPUs. That is where the real strength of OpenCL might be. Also, as far as I understood, you do not specify grid & block sizes. Those are determined by OpenCL. That might also be a reason for not supporting templates.

OpenGL and Direct3D survive together.…GL_and_Direct3D

OpenGL is open ‘C’ based API.

Direct3D is Microsoft API.

I think the implicit way is the AMD way (where the number of threads spawned is detemined by AMD drivers).

The explicit one is CUDA way!

Just my inference from the statement above and my peripheral understanding of AMD streams.

OpenCL is a bit behind CUDA, but im sure it will get there. I have been using CUDA driver api for a while and must say that i now prefer it to the runtime api. For production products that need to be intigrated into larger systems, it is more convient (at least for me). The only thing im missing is the emulator. Usualy when you are trying to squeeze every last little drop of performance from you hardware you want as low level access as you can, which is always counter balanced by the fact you want some thing more efficient then programing in byte code. I think both CUDA and openCL hit the sweet spot. ATI had brook++ which was to far away from the hardware and you couldn’t get the performance you needed. or CTM which is basically programing in assembler(or ptx…) which isn’t much fun. For compute languages to work there has to be one language that works on all hardware, or else very few will use it. Nvidia is very aware of this and i guess thats why they are fully supporting OpenCL. For my company using CUDA isn’t a problem since we have control over the hardware that our customers use. But for most products it will be critical and thus most will use OpenCL in a year or 2. And of course you will have abstraction layers and all kinds of neat tools, but like OpenGL today. If you are going to do heavy optimization then you have to get down to the lowest level.

Thanks for your note.


I read in some computer book (“Zen of graphics programming?”) – “The best optimizer is in between your ears”.

Unless you are using the most optimal parallel algorithm, there is no point in diving into low level details for performance.

We had this experience. We were getting around 70x to 80x performance with an algorithm. Then we worked on designing an entirely new one and that one gave us 120x to 220x peak… We did not even get our hands on de-cuda.

Jusss fyi.

Has any1 tried “accelerator” from microsoft??

It is .NET based Data-Parallel Library that can use the GPUs transparently (using DirectX) to accelerate your application.
The application does NOT need to know anything about GPU.

This was released way back in 2007 though.

And, it achieves lesser than 50% of what native code achieves.

Not sure if microsoft would pursue this standard though…

If that is their reasoning, then IMO it is reaching into backwardness.

Templates are not really a fixture of C++, not in spirit, but rather a way of expressing ideas much more cleanly than macros, and it’s not that much more complicated for a compiler, nor as a binary runtime if carefully executed.

@sarnath: I agree, we initially had a very low speedup. And only after rewriting parts of our algorithm did we get a significant speedup. But that aproch would have been impossible with higher level access (trust me i checked it out) like for example with brook+. But again in that sense there is very little difference between CUDA and OpenCL, its just that it will take a while for OpenCL to catch up, hopefully the fact that its a open standard and has a comity won’t hinder its progress (like what happened to OpenGL).

I see.

Were you using AMD (since you talk about Brooks) streams or CUDA ?

I skimmed through AMD streams and read through the Brooks spec (very informal spec). Somehow, I am not convinced about their design. It is all the more confusing. Comes nowhere to CUDA. CUDA is much more elegant by design. May b, Since I did not read deep, my understanding cud be very peripheral. Do you have any experience with AMD streams? How did you find it?

I started with both (actually also looked into the cell) And we decided in the end to go with CUDA. I agree with what you said about brook, i guess AMD pretty much thinks so as well since it stopped developing it and is now going with OpenCL.

Similar case here too. We looked into Cell and CUDA and zeroed in on CUDA. CELL is too costly (IBM Cell blade) for the performance it offers. And runs only Linux. Though there are PCI-E based CELL accelerators from Mercury Systems, they are priced somewhere @ 7000 USD as I remember vaguely.

I dont think AMD gave up on Streams. It was there lying un-attended for sometime… but then they woke up , released a CAL (compute abstract layer) , did a press release (AMD does this one well) and promoted it as I understand. But as I read through AMD spec, I did not find the design appealing. It was not abstracting the graphics as good as CUDA does.