Hi,
I’m currently trying to understand how to force OpenCL using Pinned memory instead of pageable memory. To understand how it works I had a look on the exemple oclBandwidth but there is something I don’t understand.
In this program two different in buffer are created : one for the pinned memory (cmPinnedData) and one target buffer (cmDevData) in which we want to copy the data.
During initialization phase, the pinned buffer is bound to a host array h_data from which we want to write into the buffer and they are initialized simultaneously. Then they are unbound and the target device buffer is created.
During the copy phase the h_phase is bound again to the pinned memory buffer and we write in the target buffer from the h_data array.
But what I don’t understand is that if we bound h_data to the pinned memory buffer, then h_data no longer point to the initial array in the host memory but to a host array that correspond to the pinned memory buffer.
So to use the pinned memory for writing to the device we should first copy it to a pinned memory using a mapping and write copy this mapped array into the target memory? In this case why don’t we count the first mapping into the total elapsed time?
Does somebody understand how it works?
// Create a host buffer
cmPinnedData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, memSize, NULL, &ciErrNum);
oclCheckError(ciErrNum, CL_SUCCESS);
// Get a mapped pointer
h_data = (unsigned char*)clEnqueueMapBuffer(cqCommandQueue, cmPinnedData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum);
oclCheckError(ciErrNum, CL_SUCCESS);
//initialize
for(unsigned int i = 0; i < memSize/sizeof(unsigned char); i++)
{
h_data[i] = (unsigned char)(i & 0xff);
}
// unmap and make data in the host buffer valid
ciErrNum = clEnqueueUnmapMemObject(cqCommandQueue, cmPinnedData, (void*)h_data, 0, NULL, NULL);
oclCheckError(ciErrNum, CL_SUCCESS);
// allocate device memory
cmDevData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE, memSize, NULL, &ciErrNum);
oclCheckError(ciErrNum, CL_SUCCESS);
// Sync queue to host, start timer 0, and copy data from Host to GPU
clFinish(cqCommandQueue);
shrDeltaT(0);
// Get a mapped pointer
h_data = (unsigned char*)clEnqueueMapBuffer(cqCommandQueue, cmPinnedData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum);
oclCheckError(ciErrNum, CL_SUCCESS);
// DIRECT: API access to device buffer
for(unsigned int i = 0; i < MEMCOPY_ITERATIONS; i++)
{
ciErrNum = clEnqueueWriteBuffer(cqCommandQueue, cmDevData, CL_FALSE, 0, memSize, h_data, 0, NULL, NULL);
oclCheckError(ciErrNum, CL_SUCCESS);
}
ciErrNum = clFinish(cqCommandQueue);
oclCheckError(ciErrNum, CL_SUCCESS);