BLAS / LAPACK for OpenCL

My group is currently starting a new project which should make heavy use of GPUs for processing due high data parallelism.

The preference in our group is to use OpenCL for portability and general vendor independence. The issue I ran into so far is that there doesn’t seem to be any complete and optimised BLAS / LAPACK implementation for OpenCL so far (in contrast to CUDA). Did I just miss something ?
As far as I know, the kernel execution model of OpenCL doesn’t allow any existing CUDA binaries to be used either, which is something we could settle for in the mean time.

Regards

Currently I think ViennaCL is the only thing out there. It provides support for level 1 & 2 BLAS functions only. Some time back I saw a Stream roadmap from AMD however where Stream SDK 2.3 was promoted to include FFT library and a beta version of something called OpenPhysics SDK.