cuda: matrix multiplication using shared and global

I’m trying to do a matrix multiplication between a 3x3 matrix and 360x360 matrix. The smaller matrix (3x3) is going to be manipulated with the first (3x3) block of the big matrix and so forth. Hence I want to have my smaller matrix constant and slide it over my bigger matrix.

Is it possible to store my smaller matrix as part of shared memory and have my bigger matrix divided into 3x3 in global?

I’m not finding a way to copy the smaller matrix to shared directly from host.

I later learnt that we can use constant memory in order to accomplish this. Is this the only way to move on or is there any better way to accomplish this. Also, is there any example similar to my usecase for using constant and cudaMemcpyToSymbol.

Thanks.

You got a pretty good answer to your question here:

[url]c++ - cuda: matrix multiplication using shared and global - Stack Overflow