Testing coalesced access pattern

CoalescedMemTest.zip (1.63 MB)

Hello, friends.
I am trying to measure the performance between coalesced access pattern and uncoalesced access pattern.
So, I have written a project for this purpuse.
I think that I follow the guideline for coalesced memory access pattern from the CUDA programming reference guide.
However, I can’t figure out meaningful difference between them.
I will attach my project.
Please look into my project, let me know that my fault or my mistake.

I expect your help.

Wow, a 1.63 MB zipfile, including all temporary files generated during compilation. I’ll never ask you for self-contained code…

What GPU are you testing this on? The penalty for non-coalesced access may be much smaller on compute capability 2.x devices due to their cache. I haven’t looked at your actual code (could not be bothered to even look for it between the 86 items in the archive you posted). However to benchmark this, you will have to carefully design your code so that the cache cannot hide the effect.

Thank you for your response.

I have tested my code on GTX285.

I have upload the complete project, so the the size of zip file is large.

In the programming guide from NVIDIA, on GTX280, the effect of coalesced access pattern looks sigificant.

First of all, as your recommendation, I will test my code.