nvcc compiler bug

Hello,

I came across what I believe is a bug in the compiler (version 3.1, both Linux and Windows). Here’s the smallest piece of code I could trigger it with:

http://eden.dei.uc.pt/~sneves/test.cu

Corresponding PTX:

http://eden.dei.uc.pt/~sneves/test.ptx

The problem lies in line 98: an unsigned integer is converted into a larger unsigned integer, but is instead treated as signed, causing its sign to be extended. This in turn causes the code to produce incorrect results.

Would it be possible to post a self-contained, runnable little program, stating both the actual and the expected output? I would be happy to look at the app and file a compiler bug if necessary. Thanks!

https://eden.dei.uc.pt/~sneves/serial.cu

The above program outputs this:

./serial

BEBCBE18275AAE1E0FFF7797978491E0AF2FFAB0164086B6774253F72EBA\

E17FD09EF039F4279DF1BC6E6498604D2E76470AD8F60BFDB9A10902BA19\

1A9B6C07

B165AFFF433425C45690746866E4DE2173B60875DB673A1FD75BC9D76EA8\

39A4B2D1088C8F83DE0E306894A08F552B1743B71ECF7384175A20F1A244\

9A550895

F12BE876\

FFFFFFFFA71F27DA893F93F891168CEAB67A7A853BDE13B25C1AB7E68198\

C1FDA0D88E115F83879F6F6F8245442169D5F13BA95EC443E450E9F6CE70\

C5312D5F0A41AAF2039B68B5

The last line is the output from the kernel. Let the first line be X, the second Y and the third Z. The last line is X + Y*Z (all multiple precision integers). The most significant word in the last line is wrong — it should be 0. If, in the function addmul_1, I change the type of “cy” to u64, the result is correct, since no sign extension is performed.

Thanks for the quick response. I will take a look at the app and let you know what I find.

I was able to reproduce the problem on WinXP64 with a GTX285. Thank you for taking the time to reduce the code to an easily runnable standalone app, and for bringing this issue to our attention. I will file a compiler bug. For ease of debugging, I further simplified to a repro case with N=1:

x= 1A9B6C07
e= 9A550895
m= F12BE876
result= FFFFFFFF91648581039B68B5

I agree with your analysis that the instruction cvt.u64.s32 in the generated PTX is a key component of the incorrect behavior.

My experiments indicate that when dialing down Open64 optimizations to -O2 the problem disappears. I would suggest trying that as a workaround. Simply add ‘-Xopencc -O2’ to the nvcc invocation.

Indeed, -O2 makes it go away. Thanks. By the way, I’ve noticed that -Os, which appears to be not very different from -O2, does also cause the problem. That might be useful for narrowing down the cause.