_align is slower than not _aligned! _align

hi,

in sdk1.1, alignTest sample, comment all runTests execpt 2:

   runTest<RGB32_misaligned>(12);

    runTest<RGB32>(12);

result on 8800gtx+intel p4 3.2G+xp+vs05:

RGB32_misaligned...

Avg. time: 16.270977 ms / Copy throughput: 2.861913 GB/s.

TEST PASSED

Testing aligned types...

RGB32...

Avg. time: 22.664101 ms / Copy throughput: 2.054621 GB/s.

TEST PASSED

result on 8800gtx+intel core2 duo quadcore +xp+vs05:

RGB32_misaligned...

Avg. time: 15.896337 ms / Copy throughput: 2.929362 GB/s.

TEST PASSED

Testing aligned types...

RGB32...

Avg. time: 21.749756 ms / Copy throughput: 2.140995 GB/s.

TEST PASSED

my own code {int a; unsigned int b, c;} also purely suffered from _align(16). Any suggestions? thanks!

Same timings here…

RGB32_misaligned…

Avg. time: 16.194593 ms / Copy throughput: 2.875412 GB/s.

RGB32…

Avg. time: 22.469749 ms / Copy throughput: 2.072392 GB/s.

Looking at .ptx code, compiler is breaking up a 4-element vector store into multiple ones (thus breakig coalescing) in the aligned RGB32 case. A bug has been filed for this, thanks for the catch.

Paulius

Sorry, the sdk readme.txt listed it for “known issues” long ago…