Hi all, I know that this is an old thread but I couldn’t find another more appropiate to my question.
I’m starting to work with MultiGPU and I’d like to make some tests for learning. My goal is to implement a basic MultiGPU solution with a cuBLAS function (sgemv) but I’m a bit lost. There’re not many examples and info.
My code would be this:
void *calGPU(void *i)
{
int totalVectors...
int lenVectors
long int formatArray = totalVectors * lenVectors;
float *gpu_vecArr, *gpu_Mat, *gpu_dotProd;
....
....
// malloc ...
cublasSetDevice(i);
cublasInit();
cublasAlloc(totalVectors * lenVectors, sizeof(float), void(**)&gpu_vecArr);
cublasAlloc(lenVectors, sizeof(float), (void**)&gpu_Mat);
cublasAlloc(totalVectors, sizeof(float), (void**)&gpu_dotProd);
for ( vecI = 0; vecI < lenVectors; vecI++)
{
cublasSetVector(lenVectors, sizeof(float), &vecArr[vecI*len_tv], 1, gpu_Mat, 1);
cublasSgemv(, totalVectors, lenVectors, , gpu_vecArr, totalVectors , gpu_Mat, , , g_dotProd,);
cublasGetVector(lenVectors sizeof(float), gpu_dotProd, , , );
..
}
...
...
cublasShutdown();
}
int main()
{
...
pthread_t threads[gpuCount];
for (i = 0; i < gpuCount; i++)
pthread_create(&threads[i], NULL, calGPU, (void*) i);
}
How could I alloc the memory for two GPU’s? Do I need to create a handle? Would it be possible to make it works? Which is the best way to distribute the vectors? Is it automatic? I think I’ve read somewhere that cuBLAS manage that.
Any help will be really appreciated.
Thanks.