I am trying to implement an algorithm using multiple gpus. And I followed the simplemultipleGPU example provided by CUDA SDK but I am facing a memory address problem. I use the
ThreadList[i] = (HANDLE)_beginthreadex( NULL, 0, &solverThread, plan , 0, NULL ); the same as simplemultipleGPU example. The “plan” is a struct. And I am trying to use the
cudaHostAlloc((void **)&x,sizeof(float)*2,cudaHostAllocMapped); to allocate some memory to the varible “x” in the main thread. But when the program goes into the function “solverThread”. And I am trying to use cutilSafeCall( cudaHostGetDevicePointer((void **)&x_gpu,(void *)x,0) ); to get the mapped addresse in the GPU. The program displayed an error from cutilSafeCall (I think) .
Meanwhile, if I use cutilSafeCall( cudaHostGetDevicePointer((void **)&x_gpu,(void *)x,0) ); to get the mapped address in the main thread and then pass the gpu pointer to the “solverThread” directly, the program works but the result is not right compared with using the my kernal function directly in the main. So I think this problem is caused by _beginthreadex. Does any one know how to solve it? Thank you very much!!!
That means if I allocate some bytes of cudaHostAllocMapped memory in “main”, I can not use cudaHostGetDevicePointer to get the device pointer in the thread created by “beginthreadex”. And I cannot pass the device pointer obtained in “main” directly to the thread created by “beginthreadex”.