When I compile my code I get the following error:
nvopencc ERROR: C:\NVIDIA\CUDA\bin64/…/open64/lib//be.exe returned non-zero status -1073741571
further investigation showed
Unhandled exception at 0x004a54e8 in be.exe: 0xC00000FD: Stack overflow.
No matter how I try to transform my CUDA code, the code is either doing nothing useful or I get the crash at compilation phase.
Since it is at nvopencc stage I wondered, maybe I could pass some parameter to nvopencc via -Xopencc parameter in nvcc.
However I failed to find a list of possible options. I checked “The CUDA Compiler Driver NVCC” pdf file and google.
Or maybe this is pointless, because be.exe is some other component? If so - what is it?
The default stack size of .EXE binary files can be edited (increased) with some tools.
Google for “windows editbin”
Christian
I found it: --opencc-options=“-O0”
But now compiler is much worse at detecting where given pointer is pointing to (shared or global).
If you compile the following code with -O0 it will crash or produce 123 output. With -O2 it will yeld correct 345 result.
class MyClass {
public:
int *data;
__device__ MyClass(int *src) {
data=src;
}
__device__ int &at(int idx) {
return data[idx];
}
};
__global__ void func2(int *out) {
__shared__ int arr[256];
arr[0]=123;
MyClass x(arr);
x.at(0)=345;
*out=arr[0];
}
void isolatedProblem2() {
int *gpuPtr;
cudaMalloc( (void**)&gpuPtr, sizeof(int));
func2<<<1,1>>>(gpuPtr);
int hVar;
cudaMemcpy(&hVar,gpuPtr,sizeof(int),cudaMemcpyDeviceToHost);
printf("%d\n",hVar);
}
I might end up modifying the stack size as you suggested, cbuchner1 - thank you.
I just wanted to try a way where I don’t have to mess up with executables…
OK, increased stack size of
\open64\lib\be.exe
from 2M to 16M and now it seems to be working. Again - many thanks for the hint!
Nevertheless I would like to express my discontent that such ugly tricks have to be done to make the compiler working!