I treated the problem of multiplying a matrix with 10 subsequent vectors as a matrix multiplication problem since the results for multiplying a matrix by one vector at a time and that for multiplying the matrix by all vectors (as in a a matrix multiplication) is the same.
For accomplishing this, since the vectors were stored in row major order, I transposed the vectors.
I have attached below my CPU side code and kernel for your kind reference.
I am totally stuck with this mini project. Pls someone tell me where and what is wrong.
I am pretty sure my logic is correct but cant get it to work.
Any help will be greatly appreciated.
code.zip (4.31 KB)