Hello,
i found out about CULA and just read the Programmers Guide. They say it has a Standard interface, which takes Host memory as input and does all the transferring for you. The Device Interface is for Memory which is already on the GPU.
Does anyone of you know if it is possible to call functions from the Device Interface of CULA from inside my own kernels?