I am implementing the Matrix Multiplication using shared memory and everything works fine for the matrices up to 2048 numbers.
However when i try to do 4096 it gives me an error:
“FIrst chance exception”.
I calculated that the shared memory being used exceeeds total global memory. So is that the error? Does shared memory “belong to” the global memory? - i.e. the more shared memory i use the less global memory i have?
How in this case can i solve it this issue if thats the case?
Shared memory is separate from global memory, and has a different scope (it is only visible to the block it belongs to, whereas global memory is visible to all blocks). When you say you calculated your shared memory usage, do you mean you calculated the amount of shared memory used per block times the number of blocks launched by your kernel? The relevant thing to check in this case is to make sure your shared memory usage per block is within the bounds of your device (the deviceQuery example in the SDK will tell you “Total number of registers available per block”, among other things). As long as each block is limiting itself to that much shared memory, your problem likely lies elsewhere.
Hope that helps.