strange things of cublas 3.0 on RHEL 5.3

I’ve just installed cuda 3.0 and want to test a simple program. I called cublas_sgemm to multiply two 22 matrices(B=AA), which turned out that the routine did nothing. I mean the output matrix B is just the same as the input matrix A. It’s strange! Previously I tried cublas 2.3 on ubuntu 8.04, and this test passed through.

Then I wondered whether or not it caused from the version. Then I tested cuda 2.3 of RHEL 5.3. This time the program leaded to segmentation fault…

Anyone knows the reason. I’m going insane… thanks…

[codebox] program matrixmod

implicit none

integer M, N

parameter (M=2, N=2)

real*4 a(M,N),b(M,N),c(M,N)

integer i, j

do j = 1, N

do i = 1, M

a(i,j) = (i-1) * M + j

enddo

enddo

do j = 1, N

do i = 1, M

b(i,j) = (i-1) * M + j

enddo

enddo

call cublas_sgemm('N','N',2,2,2,1.0,

 &        a,2,a,2,0.0,b,2)

do j = 1, N

do i = 1, M

write(*,"(F7.0$)") b(i,j)

enddo

write (*,*) ""

enddo

write (*,*) ""

do j = 1, N

do i = 1, M

write(*,"(F7.0$)") a(i,j)

enddo

write (*,*) ""

enddo

stop

end[/codebox]

further information:
If I manually allocate device memory by calling cublas_alloc, it leaded to
“device memory allocation failed”, which didn’t occur on ubuntu with cuda 2.3.

Are you using the fortran thunking interface or the regular interface ?

in cublas 3.0, the fortran.c has been split into 2 parts : fortran.[c,h] (for the regular interface) and fortran_thunking.[c,h]

If you do not the device allocation yourself, you should use the thunking interface.