Many documents mention that share memory allows more flexible programing model deviating from old GPU’s strict streaming model, because of arbitrary gather /scatter …
I don’t quite understand this, what’s restrictive in old GPU’s model, could anybody give a more detailed or perceptable (like examples) explanation?
shared memory makes it possible to interact between threads (by using __syncthreads()). Before it was not possible to have inter-thread communication.
scatter-gather has nothing to do with shared memory. I believe gather was already possible, but you can now also write to arbitrary locations in global memory.