I am trying to do de Dot Product of 2 Vectors using a cublas function. My problem is, that i don’t know how to get the calculated result out of the device. Can someone tell me how to continue with this code please?
A thing i not really understand is, now how to use multithreading with CUBLAS. How is the cublasSdot executed in my program? I guess there is only one thread working on it. Because the cublasSdot is a Host-function I can’t put it in a kernel where i can work on thread operations. So how can multithreading be realised with cublas functions (I hope I didn’t skip this part when reading the cublas-manual).
Thank you for your hints. I used a 10000x10000 matrix and executed the cublasSaxpy() a hundred times in a loop. The elapsed time using cublas was 33 ms and using a self-programmed function doing the same operations it took around 170000 ms, so it seems to be multi-threaded.