Hi,

I am new to the CUDA programming world and would like to start with the following problem: I would like to multiply a huge batch of matrices together. All matrices origin from two “base” matrices A and B. In addition, I have an array v. I would like to calculate for each value in v the matrix C = A + v[i]*B, then apply a matrix function to the resulting matrix, thus obtain D = func© (also on the GPU) and finally I would like to obtain the matrix product of all matrices D, thus D1 * D2 * D3 etc. The matrices A and B are of the order of 200 x 200, but the array v can be long.

Since I am new to the CUDA world, I would like to ask how to approach such a problem. I have taken a look at cuBLAS and the gemmBatched functions, but they are optimized matrix multiplications for two arrays each containing matrices. I would like to take this problem to learn more about doing linear algebra with CUDA. I am grateful for any hints how to start…

Julia