# Is there any function in cublas limit the matrix size less than or equal to 48

I’m working on a program to do matrix Block Householder QR decomposition with cublas. The algorithm is showed in this picture:

``````!************* code... set dQ = I ***************

if(blockSize <= 1) blockSize = m
do k=1,m,blockSize
realSize=min(m-k+1, blockSize)
ldm = m-k+1
allocate(dW(ldm,realSize),dY(ldm,realSize),B(realSize),dv(ldm))
do j=k,k+realSize-1
ldl = m-j+1; ljk = j-k+1
call house(ldm,ljk,dA(k:m,j),dv,beta)
B(ljk) = beta
allocate(vvt(ldl,ldl))
call cublasSgemm('n','t',ldl,ldl,1,1.0,dv(ljk),ldm,dv(ljk),ldm,0.0,vvt,ldl)
call cublasSgemm('n','n',ldl,k+realSize-j,ldl,-1.0*beta,vvt,ldl,dA(j,j),lda,1.0,dA(j,j),lda)
deallocate(vvt)
!\$cuf kernel do <<<*,*>>>
do i=1,ldm
dY(i,ljk) = dv(i); dW(i,ljk) = -1.0*beta*dv(i)
if(i+k-1 > j) dA(i+k-1,j) = dv(i)
end do
end do
deallocate(dv); allocate(wy(ldm,ldm))
do i=2,realSize
call cublasSgemm('n','t',ldm,ldm,i-1,1.0,dW(1,1),ldm,dY(1,1),ldm,0.0,wy,ldm)
call cublasSgemm('n','n',ldm,1,ldm,-1.0*B(i),wy,ldm,dY(1,i),ldm,1.0,dW(1,i),ldm)
end do
if(k+realSize .LE. m) then
call cublasSgemm('n','t',ldm,ldm,realSize,1.0,dY,ldm,dW,ldm,0.0,wy,ldm)
call cublasSgemm('n','n',ldm,m-k-realSize+1,ldm,1.0,wy,ldm,dA(k,k+realSize),lda,1.0,dA(k,k+realSize),lda)
end if
call cublasSgemm('n','t',ldm,ldm,realSize,1.0,dW,ldm,dY,ldm,0.0,wy,ldm)
call cublasSgemm('n','n',m,ldm,ldm,1.0,dQ(1,k),m,wy,ldm,1.0,dQ(1,k),m)
deallocate(dW,dY,B,wy)
end do
!**********check result: dQ = dQ * R **************
call cublasStrmm('r','u','n','n',m,m,1.0,dA,m,dQ,lda)
print *,"Max error:",maxval(abs(A - dQ))
``````

Problem: the order of matrix A less than or equal to 48, the program could return right result.

Once the order bigger than 48, the result returned is wrong!

Error description:

the order of matrix A less than or equal to 48:

any value of blocksize between 1 to 48 could get right result

the order of matrix A bigger than 48:

if the blocksize less than 48, the first blocksize columns get the right result, the rest get wrong. if the blocksize equal to or bigger than 48, only the first 48 columns get right result, the rest get wrong.

It seems like somewhere in the code have a limit, limit the columns to 48 size

the cuBlas functions I used: cublasSgemm cublasStrmm cublasSnrm2 cublasSdot cublasSscal

do they have the limit ???

I have checked memory allocate / deallocate with stat info, no wrong message.