What would be a fast way to subtract a vector V from every column of a matrix M using (preferably) CUBLAS?
I first tried constructing a matrix A with the same size as M and with each column equal to V, using cublasSetMatrix(). However, wouldn’t it be much faster if I could load V from host to device first, and then construct A? I can’t find a way to do that using CUBLAS. Any suggestions?