problem with vector add example in "CUDA by Example" book


I am trying to learn CUDA and reading one of the recommended book “CUDA by Example” by Sanders and Kandrot. In chapter 5, there is a vector addition example. Basiclly, the program takes in two vectors of certain length and finds the summation of the two.

The example by itself works fine but if I increase the number of elements in the vector, the program crashes.

The following works fine:

#define NUMBER_OF_ELEMENT	(64*1024)

The following will crash:

#define NUMBER_OF_ELEMENT	(128*1024)

I am also attaching the complete code.

I am trying the program on a XPS15 which has 525m 1GB memory. I think it is not limited by the memory size. Also, the example is designed to work with large vector size. So, what is wrong?

Thanks a lot for your time! (1.43 KB)

The CPU code:


Is most likely leading to a stack overflow. Replace with a dynamic memory allocation (malloc, new, std::vector, etc) and you should be OK.

Thanks a lot!


Could somebody give an example how to exactly make it working then?

Many thanks!

int a = (int)malloc(NUMBER_OF_ELEMENT * sizeof(int));

int b = (int)malloc(NUMBER_OF_ELEMENT * sizeof(int));

int c = (int)malloc(NUMBER_OF_ELEMENT * sizeof(int));

do your stuff




I solved a similar problem by using static array. These are defined:

static int a[N],b[N],c[N];

where N=number of elements