Hi all,
does anybody knows how to manage data that do not fit inside the memory of the GPU? Is there any high level function included in CUDA that handles the allocation?
I tried to evaluate the performances of a simple CUBLAS Sgemm operation on my system (GPU GTX 260) and I noticed that when my matrices A,B and C (with A*B=C) are of dimension of about 16000x16000 the program crashes. I think this is a problem of memory, as 4 byte for a single precision x 16.000 x 16.000 equals to 1 GByte about. Indeed, the code works properly with matrix 8000x8000 (in this case memory occupation of a single matrix is about 256 MByte).
I thougth that the only way is to subdived matrices and perform multiplications by blocks and I suppose that I have to manage by mayself the block size partition, the block-sub-multiplications and the final result arrangment.
In the meanwhile, I was wondering if CUDA provides a function that manages situations where one tries to allocate more memory that the available one on the device. Or at least something that provides you the max allocable memory at that time.
Thank u for any help.
Bye,
Pietro