Cuda vs OpenCL

Some interesting slides at [url=“http://www.khronos.org/opencl/presentations/OpenCL_Summary_Nov08.pdf”]http://www.khronos.org/opencl/presentation...mmary_Nov08.pdf[/url] suggests that OpenCL is going to have a Runtime API and a Platform Layer API. Now I am wondering what the difference between Cuda and OpenCL is going to be, assuming nvidia provide an OpenCL runtime api implementation for their cards on Linux and Windows. I guess I will have to wait until the specification is released to see what’s in the runtime API, that should be this year apparently :) It’s encouraging that the time invested learning Cuda appears to be relevant to this if it becomes a well adopted standard with wide support like OpenGL has.

With NVIDIA talking about the “-M” option to compile CUDA for multi-core CPUs – I guess CUDA 2.1 would include OpenCL support as well…

May b, I am dreaming much…

No, I think that’s just a ‘runtime API,’ (ie, ordinary jargon) not in any way related to CUDA’s Runtime API. Note the 2nd point on that same slide (#13), “Ecosystem foundation – no middleware or ‘convenience’ functions.” I.e., OpenCL is going to be like the Driver API, and if you want convenience you’ll get it from someone else as an add-on.

Looks like OpenCL is trying to do great work bringing all sorts of hardware, including Cell, DSPs, and apparently handhelds, into the fold. But NVIDIA really did show some great brilliance worrying about the developer’s experience when making the Runtime API. That is a crucial hinge to its success. But with NVIDIA’s pioneering example, I’m sure OpenCL will have a similar system in time. Just not officially from the Kronos group (committees don’t understand such subtle things as useability).

Yeh I can totally understand where you’re coming from, so what do you think the difference between ‘Platform Layer API’ and ‘Runtime Layer API’ (slide 14) might mean in practice?

OpenCL spec is now out.

Which means that I can answer (some) questions you have about it! If you read the spec, you will see some very strong similarities to CUDA (both in terms of the kernel language and the driver-style API)… :P

Okay ;) So does OpenCL replace in time the driver API from CUDA, and CUDA transitions into a runtime API around OpenCL?

No–they’re separate APIs. There are things you can do in CUDA that aren’t possible in OpenCL, and we don’t expect that to change going forward (it’s not like we’re going to stop adding features because OpenCL won’t support it).

The list of contributors is interesting. Besides expected ones like Apple, AMD, NVIDIA, Intel, and IBM, there’s a lot of cell-phone companies. ARM, Ericsson, Nokia, Qualcomm, Modivia.

Kind of makes sense, since cell phones are picking up ever beefier GPUs and their miniscule processors could use all the acceleration they can get. What I find intriguing is that the cell phone space also brings in a lot of new GPU architectures. NVIDIA has been making CUDA to simply expose its G80/DirectX10 architecture, but when making a GPU-like accelerator there’s actually a number of choices to make. (In fact GPUs like G200 are packing on many features that are compromising their performance/transistor ratios). Even DSPs are planned to be supported.

It’d be interesting to see how OpenCL balances the architectural possibilities.

Been reading the spec, especially in terms of balancing.

On the one hand OpenCL is not doing what I want, which is to establish specific baseline configurations to target have make different tradeoffs in performance vs transistors. (Eg, target “shader model 2.0” or “shader model 3.0”.) Instead you just get a big capabilities structure. This is flexible, but without focus and cohesion.

On the other, it’s doing some nice stuff with the OpenCL kernel language itself. Very nice support for SIMD (for the archs that use it).

Could you give some concrete examples so I get an idea of current OpenCL limitations vs CUDA?.. I ask this since to me seems that main features are there (exposed as extensions) including latest features added to CUDA: doubles, shared memory atomics, etc…

Also in theory OpenGL support is more efficient allowing sharing of resources without involving memory copies (in 2.1 wold be device mem copies and in earlier versions trough system mem.)

Please correct me if I am wrong…

Also I have same questions about OpenCL support from NVIDIA…

Quoting a press release from AMD from today:

“AMD is making good progress on its OpenCL-compliant offering and plans to release a developer version of the ATI Stream SDK with support for OpenCL 1.0 for content developers in the first half of 2009. Working from early specifications of OpenCL, AMD’s engineering team has already started running code on its initial implementation.”

I have read some NVIDIA press release from today but it isn’t so concrete about timing of the OpenCL software stack…

Can we expect also in first half of next year?..

Thanks.

See section 6.8 in the spec. Most important are the limitations on pointers and datatypes less than four bytes.

I don’t think I’m allowed to give any sort of timeframe for support yet–I’ll get back to you.

The “limitation” on pointers is actually the feature that I’ve been asking for in CUDA… namely, explicit memory spaces for pointers.

I think the limitation on <16bit types is taken away by the “byte addressable stores” extension, no?

You also can’t pass any type**.

Also, I don’t like relying on extensions for chars/shorts. Something about that feels a bit weird to me.

Like I said, there should be baselines. A DX10 baseline for NVIDIA/ATI GPUs. A Cell baseline (with all its idiosyncracies, such as probably this one). A common baseline, with a lowest-common-denominator feature set. Each baseline could run on any hardware, with necessary workarounds done in software. But it would be very clear to the developer what model he’s coding for, what hardware he’s targetting for optimal performance (if any). It would be conscious, it would be clear, it would be simple, it would bring focus.

I just hate extensions and endless querrying being the standard way to get anything done. It will just result in a mish-mash. A fog that will hurt the evolution of hardware.

Can this be circumvented by passing a single base pointer and a pointer to integer offsets? I have made frequent use of ** in my CUDA code. Is the reason for this restriction that some devices might do internal translation of pointers, which might be complicated by a type** of undefined length?

So which cards will be supported? All CUDA cards? Only CUDA 1.1? Only CUDA 2.0?

I expect all 8-series and up cards will be supported, like with CUDA. The newer cards’ features are already mapped as extensions in the OpenCL specification.

What can we expect as NVIDIA support for OpenCL? Will OpenCL programs be compiled with nvcc or will there be another compiler driver? Are there plans to create a higher level interface to OpenCL? Templated kernels were great so hopefully there will be some sort of support for that.

You do not need to quit using CUDA. As Tim said, CUDA will stay and be developed, since it supports more than OpenCL.

And looking at all the similarities, converting a program from one into the other should not be too much of a problem, so algorithm development will likely not be lost

No one knows

Yes, the whole idea is that third parties will make many high-level interfaces. (We’ll see how well that actually turns out. Something tells me not so well, and all you’ll get is a bunch of proprietary, expensive solutions with poor market share.) No idea whether they’ll support templates, although likely not. (I’m not sure how many companies will try to extend the core OpenCL kernel language).