I want the page-locked memory to be portable and mapped in my multithread program.
so, can i do like this cudaHostAlloc((void **)&address,size,cudaHostAllocPortable|cudaHostAllocMapped) ?
but when I do the cudaHostAlloc in the main thread, and do cudaHostGetDevicePointer() in the children thread, I am failed.
by the way, i used a GTX295 with 2 GPU
Who know how to do that?
Thanks.
but the section 3.2.5.1 in the manual said that the portable page-locked memory can be shared and used by all the thread.
Because I will process many big data set which bigger than 512Mb or 1Gb, it is impossible to copy them to device or create some copies of them for each thread.
But the use of the cudaHostAllocPortable flag is to make the memory pinned in all contexts :)
Interestingly, this mention about error messages in the CUDA reference manual seems to indicate that you should be able to map memory allocated in a different thread:
(emphasis added)
I unfortunately don’t have access to a system with both a G200 board and CUDA 2.2, so I can’t test this out for myself.