Okay, so where is this all heading. Can I expect my cuda code to run on multiple graphics chips in the foreseable future? Or will OpenCL take over this role?
Should I get on the ball and start rewriting/porting my CUDA code to something else?
I personally prefer CUDA and the flexibility it offers in kernel programming, I’d rather not rewrite my code if necessary, but requiring NVIDIA chips makes no sense when you’re trying to sell software in the general market
OpenCL is based on CUDA, so if you write some CUDA kernels now and want to transfer them to OpenCL later, it should be a pretty simple job. If you’ve got to have the most flexibility, just write them in OpenCL off the bat – I think you can apply to the OpenCL beta program, but I don’t know if they still taking people.
So for me there’s future on OpenCL for GPGPU computing on commercial applications (with the $29 Mac OS X 10.6 Snow Leopard upgrade), when the OpenCL platform will be released (and stabilized!).
Anyway on the mean time I focus on CUDA development because it makes me learn everything about GPGPU that I need to know, and moreover develop applications that is targeted for this architecture, on a small-scale basis (company-level or often workgroup-level).
CUDA is important to fully understand and master to be able to develop for OpenCL in the future, so go for CUDA right now :-)
What other graphics chips or other processors are you thinking of here? I don’t think that there are any major GPU platforms other than NVIDIA, AMD, and Intel. Of these, only AMD and NVIDIA offer really high performance solutions. You could try to run something on an Intel GMA, but I think that you would probably be better sticking to CPU-only code at that point for many applications…
So for high performance GPUs you are more or less deciding between AMD and NVIDIA. Does it really matter to you have to tell customers that they need to buy an NVIDIA GPU rather than just buying a high performance graphics card?
If you want to consider non-gpu architectures, the only alternatives at this point are the high end x86 multicore processors from Intel and AMD (IBM seems to be moving powerpc in more of an embedded direction and I can’t find a single reference to a Cell roadmap beyond 2008).
For these architectures I see the CUDA programming model fitting naturally, I personally strongly prefer it to alternatives such as threading libraries and OpenMP. OpenCL seems much less mature than CUDA at this point, but from a programming perspective, they are incredibly similar. Until OpenCL gives you access to a target that CUDA doesn’t I would stay away unless you particularly like some feature not available in CUDA.
Additionally, support for other high performance architectures such as Itanium are getting dropped by much more mature projects like LLVM, and I haven’t heard anything about the next generation of Niagara…
Yes my post was rather terse in tone and had a few glaring omissions, but this is all subjective and based entirely on my own opinions which may or may not be accurate, so I will continue in the same tone. Feel free to ignore me.
I regards to Larrabee, I would still say that it has not been released. The current market share is literally zero except maybe researchers working on prototypes internal to Intel and its partners. I don’t think that it is a good idea to talk about writing new code for a new architecture before you have any idea how it performs.
If at some time in the future Intel releases Larrabee, and it is competitive with NVIDIA and AMD, and Intel releases an OpenCL compiler for it, and it gains significant market share, then I would think about start porting my code to OpenCL.
The next biggest determinant of long terms success in my mind between OpenCL and CUDA (and alternatives like Intel Ct or TBB) is support for the two major high performance architectures other than NVIDIA: AMD Radeons and Intel/AMD x86 multicore. I doubt that AMD would actually release their own CUDA compiler, but there is nothing stopping a third party from doing it.
I write CUDA (runtime API) code and actually make a living selling that to customers. When I tell them the code runs 350x faster on dual GTX-280 than it does on a 4-xore 2.8 GHz Xeon, my customers don’t mind buying NVIDIA cards. As for my own investment: the code I use (for the GPU) is almost literally plain C - I can take that code and run it on a CPU with minimal changes. It was rather more painful, rewriting code for the CPU that uses Intel’s vector math and MKL libraries - without that we would be talking 600-800x speedup for the code I am talking about.
Even if I “only” get a 10x speedup, my customers are still happy.