Hi,
I’m trying to use some cuSolverDn functions (QR, SVD) but can’t seem to get them to actually run asynchronous. The basic mode of operation is
- cudaStreamCreate() + cusolverDnSetStream()
- allocate GPU memory (including workspace, parameters, etc.)
- start timer
- cudaMemcpyAsync( HtoD )
- cusolverDnDgeqrf() / cusolverDnDgesvdj()
- cudaMemcpyAsync( DtoH )
- cudaLaunchHostFunc()
- print timer
- cudaStreamSynchronize()
- print timer
All examples are tested on a TitanV GPU.
The printed times in 8/10 do not differ indicating that the cuSolver functions run synchronous, e.g. 3.605s vs 3.637s for a particular problem size with SVD.
If I replace QR/SVD from cuSolver in 5. with cublasDgemm, I get 2.2e-04s (8) and 6.6e-01s (10). So this seems to run asynchronous.
Under what conditions do cuSolver functions really run asynchronous? Or is the phrase “prefer to keep asynchronous execution” from the docs an indication that many functions actually block?
RGDS
Ronald