I am testing cublas with cuda 4.0 in a simple standalone code,
esp. the dgemm routine.
I am quite accustomed with cuda and cublas library, so I guess
my code should work fine. Plus I rechecked checked the cublas
documentation for the changes that could lead to problems.
To be sure, I tried both legacy and new api.
The problem, quite simply put:
. cublasInit() (eq. cublasCreate) takes 3 seconds
. the first call to cublas_dgemm() takes 3 seconds
The subsequent calls to cublas_dgemm take the expected times.