Hi!
i think i found a bug in nvcc:
This is how to reproduce it:
tar xzf buggy.tar.gz
cd Buggy
make; ./buggy
8D B6 A8 7D 12 2D 6F DA CB CB 0B CB CB CB 0B CB
5D 5D 69 5D 5D 5D 69 5D 6A 6A B5 6A 6A 6A B5 6A
72 72 D5 72 72 72 D5 72 D0 D0 67 D0 D0 D0 67 D0
47 47 01 47 47 47 01 47 15 15 54 15 15 15 54 15
That’s the correct output.
make clean; make BUG=1; ./buggy
00 00 00 00 12 2D 6F DA CB CB 0B CB CB CB 0B CB
00 00 00 00 5D 5D 69 5D 6A 6A B5 6A 6A 6A B5 6A
00 00 00 00 72 72 D5 72 D0 D0 67 D0 D0 D0 67 D0
00 00 00 00 47 47 01 47 00 00 00 00 15 15 54 15
^^^^^^^^^^^ This part is clearly wrong.
The only difference of the two versions is in whirl.cu, line 673:
Whith BUG defined, an integer array is copied using a for loop, without BUG defined, it is copied by memcpy.
The bug disappears, when the loop in line 650 is
unrolled, i.e. when instead of
for (r = 0; r < 2; r++)
{
Transform2(K, 0);
Transform1(S, K);
}
this code is used:
Transform2(K, 0);
Transform1(S, K);
Transform2(K, 0);
Transform1(S, K);
Is this a compiler bug or a bug in hardware?
(I reproduced the bug whith another GTX 460 card, too).
I could not reproduce it on Tesla C1060.
cuda-memcheck does not complain.
Please help me!
Kind regards
Twofisher
nvidia-bug-report.log.gz (46.4 KB)
buggy.tar.gz (4.58 KB)