Newbie question: cublas currently offers a couple of BLAS calls and the corresponding get and set functions for matrices and arrays. However, it does not offer get / set functions for single elements.
This usually makes sense. However, for my code, I have some huge calculation (a couple of sger/sgemm calls), and then I need a single element on the CPU to decide on the next sger / sgemm call.
Can I get a single element by abusing the cublas cublasGetVector/Matrix calls and shifting the void* by the necessary amount?
Or would I have to use Cuda instead? I really don’t want to sent the whole matrix back to the CPU…
Or am I overlooking something trivial here?