Cusolverdncreate required for each GPU (cudaSetDevice)?


It seems that create a context for CuSolverDN can take a bit a while and consumes significant CPU memories. When it comes to use multi-gpu application, should i run the function for each GPU?


Yes, you need a handle for each GPU, for the cusolver API. The cusolverMg API only requires a single handle.

Alright Thanks ! Do you have any idea how much CPU ressources it will take for each GPU assignment for the context handle? Is it normal that it takes a while?

Depending on your system, it can take awhile to create multiple handles. One trick is to use OpenMP/OpenACC to create handles in parallel.

// Launch one CPU thread per GPU
    omp_set_num_threads( numDevices );
#pragma omp parallel
        int ompId { omp_get_thread_num( ) };

        // We must set the device in each thread
        // so the correct CUDA context is visible
        CUDA_RT_CALL( cudaSetDevice( ompId ) );

Thanks for this idea ! :)