I was putting together a GPU system with the AMD ATI Radeon 5870 , which boasts 2 teraflops per GPU node. All this on a asus P6T7 WS Supercomputer with 3 GPU’s (plus one more at half speed). The CUDA software says it would work with any GPU’s, but I am not quite sure. So, I am posting here to find a good parallel processing software solution. That, or suggestions on a cheeper aquisition of Teslas.
CUDA won’t work on an AMD system (it is ATI-only). You could theoretically use OpenCL, but from what I hear, AMD/ATI haven’t released any OpenCL drivers for the GPU; they do have multicore CPU drivers available though, which you could program against until they come out with OpenCL GPU drivers for the Radeon.
If you are just using the Teslas for development purposes, nVidia occasionally offers developer discounts. If it’s a production machine though, you’ll probably have to pay full price for them. If you wanted to use the (IMO, much better) nVidia developer tools, you should have gotten some GTX200-series cards, or just waited a bit longer for the Fermi-based cards.
is OpenCL so similar to CUDA that you can change the code back and forth easily?
If not really, then wouldn’t going the CL way have the issue of a 2-year lag of the available tools, libraries & applications?
OpenCL is similar to the CUDA driver API. So if you have programs that have been written against the driver API, it should be easy to port to OpenCL (and vice versa). If your program is using the runtime API, you’ll have to rewrite some of the host code.
Kernels would stay mostly the same*, unless you were using textures (and even then it would mostly be a question of changing function names, not the whole program flow). The code needed to actually launch your kernels would be much elaborate than the Runtime API.
that’s assuming you kept using NVIDIA hardware. AMD GPUs will be underutilized with CUDA-like kernels because they are explicitly programmed vector machines. In CUDA (and NVIDIA’s OpenCL) you do massive scalar parallelism, for CPUs and AMD’s GPUs you’ll want to use vector intrinsics. If you were to launch a kernel coded for an NVIDIA GPU on an AMD GPU, it should work but it will only use a fraction of the resources. It should work better the other way around (ie. vector intrinsics on NVIDIA cards)
edit: oh yeah, and if you used templates in your CUDA kernels - forget about them. OpenCL is C, without any signs of C++fication.