As there are several source code files involved, I have packaged the source code
src.zip (4.1 KB)
and uploaded it.The source code implements the Montgomery modular reduction algorithms on a finite field. Different implementation approaches are used on the host and device sides. The entry kernel function is in the ‘kernel’ function in ‘main.cu’.
The results of the debug and release versions are also inconsistent. The release version produces correct results, but the debug version outputs incorrect results.
The command to compile the release version is:
nvcc -o test main.cu
The output of release version:
e98b9564, a92043ac, b25e5075, 70d69a83, 2f4a1a59, 1f8ade1c, 8c1d97e5, 343b588d, 108ce2db, d4df2d9b, f276f5d6, 1795837
e98b9564, a92043ac, b25e5075, 70d69a83, 2f4a1a59, 1f8ade1c, 8c1d97e5, 343b588d, 108ce2db, d4df2d9b, f276f5d6, 1795837
The command to compile the debug version is:
nvcc -o test -G main.cu
The output of debug version:
e98b9564, a92043ac, b25e5075, 70d69a83, 2f4a1a59, 1f8ade1c, 8c1d97e5, 343b588d, 108ce2db, d4df2d9b, f276f5d6, 1795837
54331d22, 6410f330, 9badc234, 28c1d693, 8acd7b6b, c1f71e54, c66c6c90, d3b5a2ef, e87388f6, 9854398c, 34839e1, c812a3
The two lines of numbers in the output should be exactly the same.