I have tried to use pinned memory by creating the buffer with the CL_MEM_ALLOC_HOST_PTR and subsequently mapping it into host memory space by a clEnqueueMapBuffer call as explained in the OpenCL Best practices guide.
Everything works fine, i.e. data transfers and kernel executions are working concurrently, as long as the sum of the pinned memory buffer and further global memory buffers on the GPU does not exceed the total amount of global memory, which is available on the GPU card. If I try to enlarge the pinned memory buffer, the kernel execution crashes.
In the Visual Compiler, I can see page-locked “memcpyDtoHasync” and “memcpyHtoDasync” data transfers of the size of the pinned buffer when mapping and unmapping the buffer, respectively. Thus, it seems that in contrast to CUDA, the pinned memory buffer is not only allocated in host memory but also in global device memory. Did anyone else also experience this behaviour?