Hi,

I treated the problem of multiplying a matrix with 10 subsequent vectors as a matrix multiplication problem since the results for multiplying a matrix by one vector at a time and that for multiplying the matrix by all vectors (as in a a matrix multiplication) is the same.

For accomplishing this, since the vectors were stored in row major order, I transposed the vectors.

I have attached below my CPU side code and kernel for your kind reference.

I am totally stuck with this mini project. Pls someone tell me where and what is wrong.

I am pretty sure my logic is correct but cant get it to work.

Any help will be greatly appreciated.

Thanks

code.zip (4.31 KB)