I have a few questions about the cublas implementation. From everything I can tell, the functions in cublas.h are host functions (rather than device or global). Thus, I call them on the CPU and they run there. This is how they are used in the simpleCUBLAS example and I have been able to use them this way in simple examples of my own. From what I can tell, you cannot call the CUBLAS routines within a device function.
The CUBLAS routines take pointers to things that live on the device and these pointers (according to the CUBLAS docs) cannot be dereferenced on the host.
At the beginning of the CUBLAS docs, it says that calls like:
become the following in CUBLAS:
#define IDX2(i,j,ld) (((j)*(ld))+(i))
But this last call can’t make sense. If I understand things correctly, w is a pointer to a chunk of device memory. Thus you can’t dereference it on the host like this!! Am I crazy?
If this is the case, how do you really do a call like the above cublasSdot example? I am particularly interested in doing blas operations on row and columns of matrices.