Switching from CBLAS to CUBLAS

I have a program that uses CBALS and I have Cuda installed on my computer. I was wondering what kind of modifications should I do to my programs and make files to switch from CBLAS to CUBLAS?
Ubuntu 8.04, Cuda 2.0, GTX 8800

Basically, you’re going to add GPU-side memory allocations, copy the vectors/matrices to the card, call your BLAS functions, and copy the results back. The fewer transfers you have to and from the card, the faster your performance will be.

CUBLAS has the Fortran BLAS interface (column-major ordering).

What do you mean? What difference does it make?

The CBLAS functions allow for matrices in either row-major order or column-major order. There are extra arguments in each call to specify which order you are using, for example a call to DGEMM will be:

cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, no_rows, m.no_cols, no_cols, 1.0,
data, no_rows, m.data, m.no_rows, 0.0, r.data, r.no_rows);

CUBLAS is expecting the data in ColMajor (Fortran order), not RowMajor (C order).