I have define M = 250 and N = 250 and try to compute int bigsize = (M-1)(N-1)(M-1)(N-1); but getting strange result in console ,like
M-1 = 249 N-1 = 249 bigsize = -450843295
What is the problem here.
when I multiply (M-1)(N-1)*(M-1), I get 15438249, which is correct.
But when I include math.h and calculate int bigsize = pow(N-1, 2)*pow(M-1, 2), I am getting bigsize = 2147483647
but it should be 3844124001.
For M=250 and N=250, the expression (M-1)(N-1)(M-1)*(N-1) mathematically results in a value between 231 and 232, which, however, is too large to be representable in a 32-bit signed ‘int’. The result of signed integer overflow is undefined, it could be anything.
Here you could get by using an unsigned ‘int’:
#define M (250U)
#define N (250U)
unsigned int bigsize = (M-1)*(N-1)*(M-1)*(N-1);
For a general case, it would be best to perform such computations with ‘size_t’ operands, as you could easily overflow the range of 32-bit unsigned integers as well. Side remark: You would never want to invoke pow() to simply square operands.
But
unsigned int linsizeA2 = (N-1)(M-1)(N-1)(M-1);
checkCuda(cudaMalloc((void**) &A2l, linsizeA2sizeof(double)));
giving error like that
$ cuda-memcheck ./hesttry2 |more
GPUassert: out of memory hestry2.cu 931
free memory = 11926175744 total memory = 12079136768
========= CUDA-MEMCHECK
========= Program hit cudaErrorMemoryAllocation (error 2) due to “out of memory” on CUDA API call to cudaMalloc.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 [0x2e40d3]
========= Host Frame:./hesttry2 [0x43549]
========= Host Frame:./hesttry2 [0x52f9]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed1d]
========= Host Frame:./hesttry2 [0x25b9]
========= ERROR SUMMARY: 1 error
I have tried with size_t instead of unsigned int but getting same result
Now 11926175744/((N-1)(M-1)(N-1)*(M-1)) = 3.10 but double variable would take 8 byte, is this the reason or am I making some another mistake?
Also note that you still haven’t handled the calculation correctly. If your final result is 30GB (i.e. larger than 2^32) you should use a 64-bit type such as size_t (on a 64-bit platform) instead of this:
linsizeA2*sizeof(double)
But this wouldn’t “resolve” your issue because the request to allocate 30GB of memory is going to fail on any currently available CUDA GPU that I am aware of.