I’ve a CUDA function that compute a simple operation on an array.
__global__ void funct (int *v, int *dest){
int idx=blockIdx.x*blockDim.x + threadIdx.x;
dest[idx] = 3* v[idx]+4;
}
v is vector initialized, and d is a destination vector.
When I run the program on 1000, 10000, 10000 elements of array there isn’t any problems. On the other and, when elements are 1000000 an error occurs: cudaErrorMemoryAllocation.
How do I solve the problem?
For 1000000 elements the parameters of kernel function are: DIM_GRID(1954,1) DIM_BLOCK(512,1,1)
1954 x 512 = 1000448 so you must do something like :
#define N 1000000
__global__ void funct (int *v, int *dest){
int idx=blockIdx.x*blockDim.x + threadIdx.x;
if(idx>=N)
idx = N-1;
dest[idx] = 3 * v[idx] + 4;
}
Or add N in the arguments of the kernel.
You should provide the way you allocate the memory. You should do :
int *v_dev, *dest_dev;
cudaMalloc((void**)&v_dev,sizeof(int)*N);
cudaMalloc((void**)&dest_dev,sizeof(int)*N);
You seem to be far from the amount of memory available on your GPU but this is the kind of error you have when you try to allocate more than the GPU can.
Can you provide the card, the system and the number of screens you are working on ?
Also you can check the available memory in nvidia-settings (if you are on a Linux OS).