General global coalesced access questions...

Okay, I’ll admit, I’m bad at CUDA. It’s because being CUDA requires a much deeper understanding of the hardware than CPU programming does.

So, I realize that I don’t fully understand coalescing memory. Let’s also assume I’m using a high compute capability card. So 3.5 and above, or something.

I was going to write what I think is going on but then I read it back and it just looked awful.

Does anyone wanna give me a quick rundown or relevant links for modern memory coalescence in CUDA? I’ll be checking the cuda programming guide after this.

This may be of interest:

The linked presentation:

is Fermi-centric, but covers most of the important concepts. Understanding of coalescing on Fermi covers about 95% of what you need to know to understand it on Kepler. The remainder is covered in the answer write-up. So I would suggest looking at the presentation first.

Thank you! The person on stack overflow explained that really well. I’m gonna look at the presentation now.