How to replace the following function of MKL with that of CUBLAS

Dear all,

I have the following code which originally calls the function of MKL:

The “header.h” file:

#include
using namespace std;

extern “C” void ZGEMM(char*,char*, int* ,int ,int ,complex ,c omplex , int*,complex , int, complex , complex ,int*);
extern “C” void ZTRSV(char* uplo,char* trans,char* diag,int *m, c omplex *a,int *lda, c omplex *x,int *incx);

Now I changed the “header.h” as followings:

extern “C” void ZGEMM(char* transa,char* transb, int* m,int n,int k,cuDoubleComplex alpha, cuDoubleComplex a, int lda,cuDoubleComplex b, int* ldb, cuDoubleComplex* beta, cuDoubleComplex* c,int* ldc);
extern “C” void ZTRSV(char* uplo,char* trans,char* diag,int *m, cuDoubleComplex *a,int *lda, cuDoubleComplex *x,int *incx);

and added external codes in .cpp file:
void ZGEMM(char* transa,char* transb, int* m,int n,int k,cuDoubleComplex alpha, cuDoubleComplex a, int lda,cuDoubleComplex b, int* ldb, cuDoubleComplex* beta, cuDoubleComplex* c,int* ldc)
{
cublasZgemm(*transa,*transb,*m,*n,*k,*alpha,a,*lda,b,*ldb,*b
eta,c,*ldc);
}

void ZTRSV(char* uplo,char* trans,char* diag,int *m, cuDoubleComplex *a,int *lda, cuDoubleComplex *x,int *incx)
{
int *n;
char *side;
cuDoubleComplex *alpha;
*n=1;
*side = ‘L’;
(*alpha).x = 1.0;
(*alpha).y = 0.0;
cublasZtrsm(*side, *uplo, *trans, *diag, *m, *n, *alpha, a, *lda, x, *m);
}

In the main function, I added
cublasStatus status;
status = cublasInit();

and

status = cublasShutdown();

at the beginning and end of the main function respectively.

The code was successfully built. However, it crashed when run the program. Could you please tell me what I have missed?

Thanks,
Zhanghong Tang

You must do one of two things: either build with the “thunking interface” turned on, or add memory management functions for the GPU, ie. you must explicitly allocate memory on the gpu, explicitly copy from device memory to host memory, only then call the CUBLAS function, then copy the results back to host memory, and (at some point) free the GPU memory. All this is discussed in the CUBLAS documentation.

Now I changed the code to the following, but it still crashed, could anyone point out the problem for me?

Thanks,

Zhanghong Tang

void ZGEMM(char* transa,char* transb, int* m,int n,int k,cuDoubleComplex alpha, cuDoubleComplex a, int lda,cuDoubleComplex b, int* ldb, cuDoubleComplex* beta, cuDoubleComplex* c,int* ldc)

{

int size=sizeof(cuDoubleComplex);

int sizeA, sizeB,sizeC;

cuDoubleComplex *ad, *bd, *cd;

if(*transa=='N' || *transa=='n')

{

	sizeA=(*lda)*(*k);

}

else

{

	sizeA=(*lda)*(*m);

}

if(*transb=='N' || *transb=='n')

{

	sizeB=(*ldb)*(*n);

}

else

{

	sizeB=(*ldb)*(*k);

}

sizeC=(*ldc)*(*n);

cublasAlloc(sizeA,size,(void **)&ad);

cublasAlloc(sizeB,size,(void **)&bd);

cublasAlloc(sizeC,size,(void **)&cd);

cublasSetVector (sizeA, size, a,1,ad,1);

cublasSetVector (sizeB, size, b,1,bd,1);

cublasSetVector (sizeC, size, c,1,cd,1);

cublasZgemm(*transa,*transb,*m,*n,*k,*alpha,ad,*lda,bd,*ldb,

*beta,cd,*ldc);

cublasGetVector (sizeC, size, cd,1,c,1);

cublasFree(ad);

cublasFree(bd);

cublasFree(cd);

}

void ZTRSV(char* uplo,char* trans,char* diag,int *m, cuDoubleComplex *a,int *lda, cuDoubleComplex *x,int *incx)

{

int *n;

char *side;

cuDoubleComplex *alpha;

cuDoubleComplex *ad, *xd;

*n=1;

*side = 'L';

(*alpha).x = 1.0;

(*alpha).y = 0.0;

int size=sizeof(cuDoubleComplex);

int sizeA=(*lda)*(*m), sizeB=(*m)*(*n);

cublasAlloc(sizeA,size,(void **)&ad);

cublasAlloc(sizeB,size,(void **)&xd);

cublasSetVector (sizeA, size, a,1,ad,1);

cublasSetVector (sizeB, size, x,*incx,xd,1);

cublasZtrsm(*side, *uplo, *trans, *diag, *m, *n, *alpha, ad, *lda, xd, *m);

cublasGetVector (sizeA, size, xd,1,x,*incx);

cublasFree(ad);

cublasFree(xd);

}

Dear avidday,

Thanks you very much for your kindly reply. I also noticed this problem and tried to changed the code, but it still crash. Could you please help me to take a look at it?

I think it is the problem of “complex” and “cuDoubleComplex”, the original definition is “complex” but I changed it to “cuDoubleComplex” (else compile errors). How to process complex data?

In addition, could you please tell me how to let the “thunking interface” turn on when build?

Thanks,

Zhanghong Tang

The thunking interface is discussed in some detail in Appendix A of the CUBLAS manual. It provides a transparent Fortran BLAS interface which hides all of the GPU specific things from your code. It is also very slow, so I wouldn’t recommend it, but it works if your first goal is just to get something that builds and runs.

I am afraid I cannot offer much help with the complex stuff. I use CUBLAS extensively, but only for real computations. My understanding is that the cuComplex type should just map directly to the C++/C99 complex type, but that is all I know about it.

Hi avidday,

Thanks for your kindly reply. Just now I tried the “thunking interface” method you said to test the function “ZGEMM”:

  1. added the following fortran code (generate a static library to link to the C++ main project):

SUBROUTINE ZGEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LDC)

DOUBLE COMPLEX ALPHA,BETA

INTEGER K,LDA,LDB,LDC,M,N

CHARACTER TRANSA,TRANSB

DOUBLE COMPLEX A(LDA,),B(LDB,),C(LDC,*)

CALL CUBLAS_ZGEMM(TRANSA,TRANSB,M,N,K,ALPHA,A,LDA,B,LDB,BETA,C,LD

C)

END SUBROUTINE

  1. add the fortran.c to the C++ project (the main project) and compile it with the macro “CUBLAS_USE_THUNKING”, then build the C++ project. However, it still crashed.

Could you please point out the problem for me?

Thanks,

Zhanghong Tang

Sorry, no I can’t - that is about as much as I can offer about your problem. Just to double check, which compiler(s) are you using for this?

The enviornment of this program is NVIDIA GeForce 8600 GT + Windows xp professional x64 Edition + VS2008 + Intel Fortran 11 + CUDA 3.0.

Thanks,

Zhanghong Tang

OK that explains it then. Your GPU can’t do double precision. You will have to use single precision.

Oh:(

Thanks for your so quick reply.

How about the card GT220? Or could you please suggest to me a cheapest GPU card which supports the complex double precision?

Thanks,

Zhanghong Tang

The GTX 260 or GTX 275 are the least expensive cards that support double precision. At the moment, only the GTX 260/275/280/285/295 support double precision amongst the consumer cards. There are also Quadro and Tesla cards which support double precision, but they are considerably more expensive than any of the consumer cards.

Dear avidday,

Thank you very much for your so useful information.

Zhanghong Tang

Dear all,

I noticed that the CUDA3.0 released and the latest CUBAS support all BLAS functions. I also successfully linked the CUBLAS into my application.

I have two questions:

  1. the data type in my program is complex*16 but my GPU card is GeForce GT 220 which should not support the data type. However, when I tried to run the program on my machine and found that it can run without crash. How to explain it?

  2. when linked to the Intel MKL library, the program only takes about 30 seconds but after linked to the CUBLAS library, it have spent for abut 30 minutes and it is still running. What is the program doing?

Thanks,
Zhanghong Tang