Hi,
when using the fortran wrapper as provided by the SDK one needs to define the device pointers as integer*8 on 64bit linux machine.
The documentation to the CuBLAS 3.2 RC lib states
The following program demonstrates that this is not the case.
[codebox]
PROGRAM TEST_ALLOC
IMPLICIT NONE
!
INTEGER GPURAM, MIBYTE
PARAMETER (GPURAM=1023, MIBYTE=1024*1024)
INTEGER IX,STAT
EXTERNAL CUBLAS_INIT, CUBLAS_SHUTDOWN, CUBLAS_ALLOC, CUBLAS_FREE
INTEGER*4 CUBLAS_INIT, CUBLAS_SHUTDOWN, CUBLAS_ALLOC, CUBLAS_FREE
INTEGER*4 DEVLOCA(2)
INTEGER*8 DEVLOCB
! > Initialize CuBLAS
STAT = CUBLAS_INIT()
IF (STAT .NE. 0) WRITE(*,*) 'cublas init failed'
! > Preset field for device location
DEVLOCA(1) = -1
DEVLOCA(2) = -1
WRITE(*,*) ' DevLocA(1),DevLocA(2) =',DEVLOCA(1),DEVLOCA(2)
STAT = CUBLAS_ALLOC(1, MIBYTE, DEVLOCA)
IF (STAT .EQ. 0) THEN
WRITE(*,*) ' DevLocA(1),DevLocA(2) =',DEVLOCA(1),DEVLOCA(2)
STAT = CUBLAS_FREE(DEVLOCA)
ELSE
WRITE(*,*) 'allocation failed'
END IF
! > Shutdown CuBLAS
STAT = CUBLAS_SHUTDOWN()
IF (STAT .NE. 0) WRITE(*,*) 'cublas shutdown failed'
!
END
[/codebox]
Compiling the fortran wrapper files with
gcc -I /opt/cuda/include/ -o cublas_wrapper.o -c fortran.c
and
gfortran -fno-second-underscore -I/opt/cuda/include -L/opt/cuda/lib64 -lcublas -lcudart -L/usr/lib64 -lgfortran cublas_wrapper.o testalloc.f
to get the executable I get the following output.
heinemey@gpu-1:~/tmp/nvidia> a.out
DevLocA(1),DevLocA(2) = -1 -1
DevLocA(1),DevLocA(2) = 1048576 0
This shows that cublas_alloc expects a 64 bit integer for the device pointer.
The problem seems to me that the definition of devptr_t in fortran.h changed from version 3.1 to 3.2.
CuBLAS 3.2: typedef size_t devptr_t;
CuBLAS 3.1: typedef uint devptr_t;
but the documentation was missed.
Cheers,
Eric