until recently I was under the impression that I should be able to allocate all of the GPU memory.
The GTX 285 has 16373 x 64 kiB total global memory as shown by deviceQuery.
That’s a little over 1023 MiB. When allocating memory in a simple loop (cf. code below)
I can allocate at most 984 MiB. Thus I’m missing 629 x 64KiB (about 39 MiB).
On a GTX 480, which has 24569 x 64 KiB total global memory (about 1535 MiB), I can
allocate 1429 MiB, thus missing 1705 x 64 KiB (about 107 MiB).
Has anybody else observed this behaviour? Does somebody know what this is like on
the Tesla C2050 or C2070? Can somebody give an explanation for this?
I tested this with CuBLAS 3.2 RC with the latest 64 bit linux driver and the following fortran programm
using the standard fortran wrapper.
PROGRAM TEST_MEM IMPLICIT NONE
INTEGER GPURAM, MIBYTE
PARAMETER (GPURAM=1023, MIBYTE=1024*1024)
EXTERNAL CUBLAS_INIT, CUBLAS_SHUTDOWN, CUBLAS_ALLOC, CUBLAS_FREE INTEGER*4 CUBLAS_INIT, CUBLAS_SHUTDOWN, CUBLAS_ALLOC, CUBLAS_FREE INTEGER*8 DEVLOCB
! > Initialize CuBLAS
STAT = CUBLAS_INIT()
IF (STAT .NE. 0) WRITE(*,*) 'cublas init failed'
IX = 1
100 STAT = CUBLAS_ALLOC(1,MIBYTE, DEVLOCB)
IF (STAT .NE. 0) THEN WRITE(*,*) 'allocation failed for ix =',IX ELSE IX = IX + 1 GOTO 100 END IF
! > Shutdown CuBLAS
STAT = CUBLAS_SHUTDOWN()
IF (STAT .NE. 0) WRITE(*,*) 'cublas shutdown failed'