Host to device transfer using pinned memory in oclBandwidth

revers · January 6, 2010, 5:25pm

Hi,

I’m currently trying to understand how to force OpenCL using Pinned memory instead of pageable memory. To understand how it works I had a look on the exemple oclBandwidth but there is something I don’t understand.

In this program two different in buffer are created : one for the pinned memory (cmPinnedData) and one target buffer (cmDevData) in which we want to copy the data.

During initialization phase, the pinned buffer is bound to a host array h_data from which we want to write into the buffer and they are initialized simultaneously. Then they are unbound and the target device buffer is created.

During the copy phase the h_phase is bound again to the pinned memory buffer and we write in the target buffer from the h_data array.

But what I don’t understand is that if we bound h_data to the pinned memory buffer, then h_data no longer point to the initial array in the host memory but to a host array that correspond to the pinned memory buffer.

So to use the pinned memory for writing to the device we should first copy it to a pinned memory using a mapping and write copy this mapped array into the target memory? In this case why don’t we count the first mapping into the total elapsed time?

Does somebody understand how it works?

// Create a host buffer

		cmPinnedData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR, memSize, NULL, &ciErrNum);

		oclCheckError(ciErrNum, CL_SUCCESS);

		// Get a mapped pointer

		h_data = (unsigned char*)clEnqueueMapBuffer(cqCommandQueue, cmPinnedData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum);

		oclCheckError(ciErrNum, CL_SUCCESS);

		//initialize 

		for(unsigned int i = 0; i < memSize/sizeof(unsigned char); i++)

		{

			h_data[i] = (unsigned char)(i & 0xff);

		}

	

		// unmap and make data in the host buffer valid

		ciErrNum = clEnqueueUnmapMemObject(cqCommandQueue, cmPinnedData, (void*)h_data, 0, NULL, NULL);

		oclCheckError(ciErrNum, CL_SUCCESS);

	// allocate device memory 

	cmDevData = clCreateBuffer(cxGPUContext, CL_MEM_READ_WRITE, memSize, NULL, &ciErrNum);

	oclCheckError(ciErrNum, CL_SUCCESS);

	// Sync queue to host, start timer 0, and copy data from Host to GPU

	clFinish(cqCommandQueue);

	shrDeltaT(0);

	 // Get a mapped pointer

	 h_data = (unsigned char*)clEnqueueMapBuffer(cqCommandQueue, cmPinnedData, CL_TRUE, CL_MAP_WRITE, 0, memSize, 0, NULL, NULL, &ciErrNum);

	 oclCheckError(ciErrNum, CL_SUCCESS);

		// DIRECT:  API access to device buffer 

		for(unsigned int i = 0; i < MEMCOPY_ITERATIONS; i++)

		{

				ciErrNum = clEnqueueWriteBuffer(cqCommandQueue, cmDevData, CL_FALSE, 0, memSize, h_data, 0, NULL, NULL);

				oclCheckError(ciErrNum, CL_SUCCESS);

		}

		ciErrNum = clFinish(cqCommandQueue);

		oclCheckError(ciErrNum, CL_SUCCESS);

Topic		Replies	Views
OpenCL Pinned Memory with CL_MEM_COPY_HOST_PTR? CUDA Programming and Performance	0	1325	September 13, 2016
pinned memory CUDA Programming and Performance	0	3758	February 23, 2010
Pinned Memory in OpenCL CUDA Programming and Performance	1	3159	February 19, 2011
DriverAPI: How to set the data to an pinned memory. CUDA Programming and Performance	1	918	May 7, 2009
Pinned memory concept - windows driver CUDA Programming and Performance	0	1518	January 20, 2012
SDK Bug: oclMedianFilter Program creates pinned memory but then uses pageable memory instead. CUDA Programming and Performance	0	1290	May 6, 2010
How does the copy back & forth from GPU work ? (any nvidia tech in the audience ?) CUDA Programming and Performance	4	1302	August 30, 2010
17 GB/s device to host from oclBandwidthTest bug in code? CUDA Programming and Performance	0	1093	July 30, 2010
zero copy : Device 0 cannot map host memory! zero copy not working, unable to map host memory? CUDA Programming and Performance	4	6545	June 9, 2009
cudaMallocHost CUDA Programming and Performance	3	2823	June 8, 2011

Host to device transfer using pinned memory in oclBandwidth

Related topics