I have a bunch of matrices stored in a strided fashion to be processed by cblas’ GEMMStridedBatched. Now, I would like to add a constant to the diagonal elements of all matrices, thus to perform the operation
M[i]= M[i] + c*I, where
I is the identity matrix and
c a constant which is the same for all matrices in the batch. I looked for a strided batched AXPY, but cuBLAS doesn’t seems to have implemented that (for now?). Do you have a hint for me how to do this efficiently? In the end, I would like to calculate
M[i] = M[i] + c*I + A[i]*B[i] efficiently for a batch of small matrices.