Hello,
It seems that create a context for CuSolverDN can take a bit a while and consumes significant CPU memories. When it comes to use multi-gpu application, should i run the function for each GPU?
Thanks
Hello,
It seems that create a context for CuSolverDN can take a bit a while and consumes significant CPU memories. When it comes to use multi-gpu application, should i run the function for each GPU?
Thanks
Yes, you need a handle for each GPU, for the cusolver API. The cusolverMg API only requires a single handle.
Alright Thanks ! Do you have any idea how much CPU ressources it will take for each GPU assignment for the context handle? Is it normal that it takes a while?
Depending on your system, it can take awhile to create multiple handles. One trick is to use OpenMP/OpenACC to create handles in parallel.
// Launch one CPU thread per GPU
omp_set_num_threads( numDevices );
#pragma omp parallel
{
int ompId { omp_get_thread_num( ) };
// We must set the device in each thread
// so the correct CUDA context is visible
CUDA_RT_CALL( cudaSetDevice( ompId ) );
}
Thanks for this idea ! :)