Hi, I’m dealing with a PDE solver problem.

In every thread I need to multiply a matrix with a vector.

The vector is the solution data(can be store in global memory). And the matrix is an constant, it is the same for every thread.

However, the matrix size is about 1024*1024, which requires at least 4M memory, much larger than the 64K constant memory.

Since I need to read it in every step of calculation, it will decrease program performance largely if I put it into the global memory.

Is there any method to solve this problem?

Thanks.

I’m sorry since it seems that I put this question in a wrong board.