Is it possible to utilize CUBLAS from OpenCL somehow?
I doubt it, as there does not seem to be a valid CUDA context while running OpenCL. At least the CUDA context is not bound to the current thread, as cuCtxGetCurrent does not work. And it is my understanding that this is what libraries should use to get the proper CUDA context. I would be very happy to hear about any progress you make in getting this working, as this would also open us the way to nice CUDA functionality like the cache configuration.
- create a CUDA context and simply pass data from OpenCL to CUDA and back again. ugly overhead, but still possibly worth it.
- use AMD’s BLAS library (note ArrayFire OpenCL has this integrated)
Thanks everyone! I am looking at all the options. I could not get the latest AMD Blas to work on Nvidia hardware, and I see that one other user on AMD forums had similar problems as mine.
melonakos, can you post what version you use :)?
I also looked at ViennaCL (http://viennacl.sourceforge.net/). It builds and runs fine on both AMD and Nvidia hardware but the performance was not as good as I would like.
libclAmdBlas.so.1.4.182, so v1.4
Thanks! Will try that.