Worth loading all to shared memory?

I have a matrix to which I need to add a certain quantity. Let’s say 2 to each element.
In order for a kernel to do this, and since the algorithm is basically:

1-readElement
2-Add 2
3-writeElement

the question is: Is it worth it to load each submatrix into shared memory to perform the operations or can I just read modify and write the submatrix through registers? I am assuming coalesced global memory reads in both cases.
My opinion is that doing it with registers I can save shared memory.

No, it is not worth it.

With such simple processing you won’t benefit from using shared memory. Just make sure your global memory reads and writes are coalesced.