Different Performance but same ptx Code

sietsch · May 11, 2011, 12:36pm

Hi there,

I have implemented two versions of the same algorithm:

Version 1:

works on a 1-dim array arr[x*y]

for (a = 0; a < rows; a++)

 for (i = 0; i < cols; i++)

  arr[i * cols + a] = 0;

Version 2:

works on a 2-dim array arr[y]

for (a = 0; a < rows; a++)

 for (i = 0; i < cols; i++)

  arr[a][i] = 0;

The Nsight Analyser reveals that version 2 performs 30% faster than version 1.

I looked into the ptx code as I assumed this would give me an idea where the performance boost of version 2 comes from.

Surprisingly, the two ptx files differ only in some very minor points (unsigned here, signed there…)

Does anybody have a hint where to look for the reason this behavior?

Thanks,

Sietsch.

avidday · May 11, 2011, 1:28pm

Where is arr stored?

tera · May 11, 2011, 2:10pm

The equivalent to version 2 would be this version 3:

for (a = 0; a < rows; a++)

 for (i = 0; i < cols; i++)

  arr[a * cols + i] = 0;

sietsch · May 17, 2011, 9:36am

Hi there,

sorry for the late reply.

@avidday,
arr is stored in shared memory.

@tera
Good point! ;-)
The code shown here is only an abstraction of my real code. I’m currently checking to see if I did the same mistake in my real code…

Best regards and thanks for your help,
Sietsch.

mneubert · May 17, 2011, 9:53am

hi sietsch,

version 3 from tera should be the version 1 from you equivalent to version 2 from you. Your indexing is wrong in the first code snippet. Please try your test again after correcting is done.

Are the variables rows and cols static, defines or may they change during execution (multiple calls of that loop)?

Yours,
mneubert

sietsch · May 17, 2011, 10:21am

Hi there,

yes, I found the same error as pointed out by tera and mneubert in my real code.
I fixed that and now, both versions run at the same speed.

Thanks for your help!!!

Cheerz,
Sietsch.