Hi,
please, advice me how to solve the following problem with elegant way. I need to run several cubasSgemm with large arrays and simultaneously compute something on my CPU. So, the algorithm is like the following:
for(i=0; i<K; i++)
{ cublasSgemm(… A+i*M, … ,A+(i+1)*M, …);
… run CPU part of my algorithm …
}
Actually, I have no estimations of computational time of my CPU and GPU parts, and each call to cublasSgemm requires data from previous call.
To make these calls correctly I should run something like:
for(i=0; i<K; i++)
{ cublasSgemm(… A+i*M, …, A+(i+1)*M, …);
cubasTreadSynchronize();
}
however, in this case the process will be blocked.
Am I right that I can run it simultaneously only if I start two threads, the first one will handle only GPU part of my computations, and the second one - CPU part of my computations, or there is more clever way to do so?
Thank you in advance!
Sincerely
Ilghiz