How much faster is shared memory vs global memory? has anyone run some tests?

is there any information about exactly how much faster is read/write access on shared memory versus global memory?

has anyone run tests and got some empirical numbers?

See the benchmarks I made in this post: [url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

For comparison, global memory can reach 70 GiB/s.

so 111.5 Gb/s for shared memory and 70 Gb/s for global memory?

that’s not a very big difference… am i missing something?

Well the 111 GiB/s number was obtained with random accesses = lots of bank conflicts. With a warp coherent access (no bank conflicts), the benchmark got 233.36 GiB/s.

Not only is the bandwidth much larger, but more importantly, the latency is much lower. Reads and writes to shared memory (1 cycle if no bank conflicts) happen almost instantaneously compared to the huge (200 cycle) global memory latency.