My group is currently starting a new project which should make heavy use of GPUs for processing due high data parallelism.
The preference in our group is to use OpenCL for portability and general vendor independence. The issue I ran into so far is that there doesn’t seem to be any complete and optimised BLAS / LAPACK implementation for OpenCL so far (in contrast to CUDA). Did I just miss something ?
As far as I know, the kernel execution model of OpenCL doesn’t allow any existing CUDA binaries to be used either, which is something we could settle for in the mean time.