The following code can be compiled successfully with cuda8.0 on 1080ti, but move to T4 card and use cuda10.0 failed to compile, it will hang up in this device function and will not report any errors.
device static float example(float a,
float corr, tmp1;
float PI = 3.1415926f;
for(n=-20; n<20; n+=1)
tmp = 0.0;
tmp1 = (float) (PI*n);
for(k=0; k<10; k++, tmp+=tmp1)
corr += (float) ((a[k]*c[k] + b[k]d[k]) cos(tmp));
corr += (float) ((b[k]*c[k] - a[k]d[k]) sin(tmp));
However, after changing cos(tmp) and sin(tmp) to cos((double)tmp) and sin((double)tmp), it can be compiled successfully with cuda10.0 on T4 card.
Why can “tmp” compile successfully when its data type is “float” on cuda8.0, but it must be changed to “double” on cuda10.0?
Please format your code properly, for example here are instructions.
Please provide a complete example.
It’s not clear when you say “compile” if you mean compilation - running the compiler, or actually running the executable code. For example you say this:
Do you mean the compiler hangs up in the device function (I’m not sure how you would know that) or do you mean that when you run the code, the code hangs up in the device function (and how do you know that?)
At any rate, a complete example will help others to help you.
Sorry, I didn’t describe it clearly before. “it will hang up in this device function” refers to"when compiling, the display interface will always stay in the file where the function is located, it will not be compiled down, and the file where the function is located will not be compiled “.o” file, just like the compiler is dead."
In addition, this device function is a static function, after changing to a dynamic function, the data type of “tmp” is “float”, and cuda10.0 is used on T4 can be compiled successfully.
In cuda8.0 can be compiled successfully without any modification. This problem makes me very confused.
If you wish to address the first 2 things I asked for, I’ll take a look, as time permits. Do as you wish, of course.
Thanks for your reply, this problem occurs in a set of very complex modules with a large amount of code, and this example device function calls deeply, I’m sorry that this set of code can’t be provided.
When I want to use a set of simpler code to reproduce the compilation problems of “cos” and “sin”, but it can’t reappear, so I can’t provide a complete demo of problem reproduction.