Host to device transfer using pinned memory in oclBandwidth

Hi,

I’m currently trying to understand how to force OpenCL using Pinned memory instead of pageable memory. To understand how it works I had a look on the exemple oclBandwidth but there is something I don’t understand.

In this program two different in buffer are created : one for the pinned memory (cmPinnedData) and one target buffer (cmDevData) in which we want to copy the data.

During initialization phase, the pinned buffer is bound to a host array h_data from which we want to write into the buffer and they are initialized simultaneously. Then they are unbound and the target device buffer is created.

During the copy phase the h_phase is bound again to the pinned memory buffer and we write in the target buffer from the h_data array.

But what I don’t understand is that if we bound h_data to the pinned memory buffer, then h_data no longer point to the initial array in the host memory but to a host array that correspond to the pinned memory buffer.

So to use the pinned memory for writing to the device we should first copy it to a pinned memory using a mapping and write copy this mapped array into the target memory? In this case why don’t we count the first mapping into the total elapsed time?

Does somebody understand how it works?

// Create a host buffer

		cmPinnedData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, memSize, NULL, &ciErrNum);

		oclCheckError(ciErrNum, CL_SUCCESS);

		// Get a mapped pointer

		h_data = (unsigned char*)clEnqueueMapBuffer(cqCommandQueue, cmPinnedData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum);

		oclCheckError(ciErrNum, CL_SUCCESS);

		//initialize 

		for(unsigned int i = 0; i < memSize/sizeof(unsigned char); i++)

		{

			h_data[i] = (unsigned char)(i & 0xff);

		}

	

		// unmap and make data in the host buffer valid

		ciErrNum = clEnqueueUnmapMemObject(cqCommandQueue, cmPinnedData, (void*)h_data, 0, NULL, NULL);

		oclCheckError(ciErrNum, CL_SUCCESS);

	// allocate device memory 

	cmDevData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE, memSize, NULL, &ciErrNum);

	oclCheckError(ciErrNum, CL_SUCCESS);

	// Sync queue to host, start timer 0, and copy data from Host to GPU

	clFinish(cqCommandQueue);

	shrDeltaT(0);

	 // Get a mapped pointer

	 h_data = (unsigned char*)clEnqueueMapBuffer(cqCommandQueue, cmPinnedData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum);

	 oclCheckError(ciErrNum, CL_SUCCESS);

		// DIRECT:  API access to device buffer 

		for(unsigned int i = 0; i < MEMCOPY_ITERATIONS; i++)

		{

				ciErrNum = clEnqueueWriteBuffer(cqCommandQueue, cmDevData, CL_FALSE, 0, memSize, h_data, 0, NULL, NULL);

				oclCheckError(ciErrNum, CL_SUCCESS);

		}

		ciErrNum = clFinish(cqCommandQueue);

		oclCheckError(ciErrNum, CL_SUCCESS);