cublas_sdot bug ? sdot should be single precision

cublas_sdot appears to deliver a double precision result. Shouldn’t this be a single proecision blas function ?
( cublas_ddot is the double precision function) .
System : CUDA 2.0 on Linux - RedHat EL5 x86_64.

Test code follows. Answer should be 204, but doesn’t work when cublas_sdot is declared real.

  program sdot_test
  implicit real (a-h,o-z)
  integer*4 size,dev_x
  double precision cublas_sdot

c real cublas_sdot
parameter (n=8,size=4)
dimension y(n),z(n)
call cublas_init()
do j=1,n
call cublas_Alloc(n,size,dev_x)
call cublas_Set_Vector(n,size,y,1,dev_x,1)
call cublas_Get_Vector(n,size,dev_x,1,z(1),1)
print *,‘s0,s1,s2’,s0,s1,s2
call cublas_free(dev_x)

The problem is in the fortran.c wrapper, not in cublas.
You can easily modify the fortran.c and replace:
double CUBLAS_SDOT (const int *n, const float *x, const int *incx, float *y, const int *incy)
float CUBLAS_SDOT (const int *n, const float *x, const int *incx, float *y, const int *incy)

removing the incorrect double declaration and leaving the correct single precision one:
float CUBLAS_SDOT (const int *n, const float *x, const int *incx, float *y, const int *incy)

will give you the correct results:

gcc -c fortran.c -I/usr/local/cuda/include
g95 --no-second-underscore sdot_test.f90 fortran.o -L/usr/local/cuda/lib -lcublas -lmkl -lguide -lpthread

./a.out ( with the real cublas_sdot declaration in your source code)

s0,s1,s2 204. 204. 204.

EDIT: The problem is due to the g77 calling convention:
Functions that return type default REAL actually return the C type
double, and functions that return type COMPLEX return the values via an
extra argument in the calling sequence that points to where to store the
return value.
If you are not using g77 but a more recent compiler (gfortran, g95, ifort, etc), you should change some
of the defines in the fortran.c file.