How to understand 'Shared mem' is the key feature

Many documents mention that share memory allows more flexible programing model deviating from old GPU’s strict streaming model, because of arbitrary gather /scatter …

I don’t quite understand this, what’s restrictive in old GPU’s model, could anybody give a more detailed or perceptable (like examples) explanation?


shared memory makes it possible to interact between threads (by using __syncthreads()). Before it was not possible to have inter-thread communication.

scatter-gather has nothing to do with shared memory. I believe gather was already possible, but you can now also write to arbitrary locations in global memory.