I do know what stack overflow error is, but my question is: why am i getting it?
Here is the code: it’s basically just testing whether i can allocate 1000*1000 of integers on GPU.
#include <iostream>
using namespace std;
void test()
{
const int N = 1000 * 1000;
int a[N] = {0};
int *dev;
cudaMalloc( (void**) &dev, N * sizeof(int));
cudaMemcpy( dev, a, N * sizeof(int), cudaMemcpyHostToDevice);
cout << "Allocated successfully" << endl;
}
int main()
{
test();
cin.get();
return 0;
}
Now, clearly i cant allocate an array of size 1000 * 1000 since im getting the error BUT, my total GPU memory is 513MB and sizeof(int) returns 4 bytes with my compiler.
So if we do the math:
(1000 * 1000 * 4) / 1024 / 1024 = 3.8MB
What is wrong? Can anyone explain to me what im doing wrong?
Hi,
the problem is not on the GPU side, but on the CPU one: “int a[N];”
A is an automatic array here so should be allocated on the (CPU memory’s) stack. That’s where the overflow occurs I suspect.
Not so sure…
the cudaMemcpy call is the first one where you really try to access the array a, so this might trigger the error.
Just for the sake of it, try to malloc/free it instead and let me know.
void test() {
const int N = 1000 * 1000;
int *a = new int[N];
int *dev;
cudaMalloc( (void**) &dev, N * sizeof(int));
cudaMemcpy( dev, a, N * sizeof(int), cudaMemcpyHostToDevice);
cout << "Allocated successfully" << endl;
free[] a;
}
Hey…
it WORKED!, no stack overflow exception. Could you please explain to me why dynamically allocating the array and then passing contents to GPU works and
having
Cool!
It’s because automatic arrays (like “int a[N];”) are allocated on the stack, and dynamic ones (like “int *a=new int[N];”) are allocated on the heap.
The stack is a somewhat small memory area where you can’t create very large arrays, whereas the heap basically allows for all the available memory on the machine, and more (with virtual memory, but it’s not the place for explaining this).
Bottom line is, when you need large chunks of memory, try to manage their allocation on the heap, not on the stack.
Hey,
that makes perfect sense!.. i know the difference between stack and heap coming from Java background but just did not realize that static arrays were allocated on the stack in C++/CUDA.
So you were right in the first post by saying that the error is not GPU specific.
Thanks a million.
I had similar problem with the declaration and the program was crashing at the cudamemcpy line. I went around it by using “static int a[N]” . this works up to some point.
when I used the declaration int a[N], my code was crashing when N was large when I was try to do copy from device to host. After I changed the declaration in