I tried to allocate an array of 1664340340=118,374,400 doubles via
cudaMalloc((void **) &arr, sizeof(double)6416340340);
or via
cudaMallocPitch((void **) &arr, &pitch_arr, sizeof(double)6416, 340340);
and for both cases I got failure (“unspecified”) in a test kernel that just writes 1.0 to an array element:
global void kernel_fill(double arr) {
int x = blockIdx.x;
int y = blockIdx.y;
for (int i=0; i< 6416; i++) {
arr[(y*340+x)6416+i] = 1.0;
}
}
kernel_fill<<<dim3(340,340),dim3(1,1)>>>(arr);
If to decrease the array length e.g. in 16 times, it works correctly.
Is there some limit for 1D (cudaMalloc) or 2D (cudaMallocPitch) array size in CUDA for CC=2.0?
Yes, it’s Tesla with 6G (in non-ECC). Block/grid config is not nice just for clear reading. Other grid configs give the same - failure for huge array, ok for a less size.