latency with CUDA 4.0 and cublas

texane · March 15, 2011, 8:23pm

Hi,

I am testing cublas with cuda 4.0 in a simple standalone code,
esp. the dgemm routine.

I am quite accustomed with cuda and cublas library, so I guess
my code should work fine. Plus I rechecked checked the cublas
documentation for the changes that could lead to problems.

To be sure, I tried both legacy and new api.

The problem, quite simply put:
. cublasInit() (eq. cublasCreate) takes 3 seconds
. the first call to cublas_dgemm() takes 3 seconds

The subsequent calls to cublas_dgemm take the expected times.

Is this normal? I am missing something?

Thanks for helping,

f.

Christopher_Cameron · March 16, 2011, 9:00am

Yes, driver/context initialization and module load can be expensive in the released CUDA 4.0 RC1 driver (and in previous driver).

The next CUDA 4.0 RC driver will include optimizations which will substantially decrease driver and context initialization time and module load time.

texane · March 16, 2011, 11:58am

Ok, thanks for confirming the “issue”, I will wait until

the next RC.

Regards,

f.