Why is device pointer size 32-bit in x64?

Hello, all.

I am porting my program from CUDA to OpenCL.

And now I am troubled by the difference of pointer size between host and device.

In CUDA x64, the pointer size is same. (Host=Device=64-bit)

But, in OpenCL x64, the pointer size is different. (Host=64-bit, Device=32-bit)

I tested it using the following test program in OpenCL.

I will appreciate any help.


My Environment

OS : Windows7 64-bit

CPU : Intel Xeon E5620

GPU : NVIDIA Tesla C2050

SDK : NVIDIA GPU Computing SDK 4.0, Visual Studio2010 express

Test Program:

/* host.c */

#include <stdio.h>

#include <stdlib.h>

#include <CL/cl.h>

#define BUF_SIZE (32)

#define MAX_SOURCE_SIZE (0x100000)

int main()


	cl_device_id device_id = NULL;

	cl_context context = NULL;

	cl_command_queue command_queue = NULL;

	cl_mem memobj = NULL;

	cl_program program = NULL;

	cl_kernel kernel = NULL;

	cl_platform_id platform_id = NULL;

	cl_uint ret_num_devices;

	cl_uint ret_num_platforms;

	cl_int ret;


	char buf[BUF_SIZE];


	FILE *fp;

	char fileName[] = "./device.cl";

	char *source_str;

	size_t source_size;

	/* Load Kernel Source Code */

	fp = fopen(fileName, "r");

	if (!fp)


		fprintf(stderr, "Failed to load kernel.\n");



	source_str = (char*)malloc(MAX_SOURCE_SIZE);

	source_size = fread(source_str, 1, MAX_SOURCE_SIZE, fp);



	ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);

	ret = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &ret_num_devices);

	context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &ret);

	command_queue = clCreateCommandQueue(context, device_id, 0, &ret);

	memobj = clCreateBuffer(context, CL_MEM_READ_WRITE, BUF_SIZE * sizeof(char), NULL, &ret);

	program = clCreateProgramWithSource(context, 1, (const char **)&source_str,

					(const size_t *)&source_size, &ret);

	ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);

	kernel = clCreateKernel(program, "device_func", &ret);

	ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&memobj);

	ret = clEnqueueTask(command_queue, kernel, 0, NULL, NULL);

	ret = clEnqueueReadBuffer(command_queue, memobj, CL_TRUE, 0,

				BUF_SIZE * sizeof(char), buf, 0, NULL, NULL);


	printf("Host Pointer Size  : %d byte\n", sizeof(char *));

	printf("Device Pointer Size: %d byte\n", buf[0]);

	ret = clFlush(command_queue);

	ret = clFinish(command_queue);

	ret = clReleaseKernel(kernel);

	ret = clReleaseProgram(program);

	ret = clReleaseMemObject(memobj);

	ret = clReleaseCommandQueue(command_queue);

	ret = clReleaseContext(context);


	return 0;


/* device.cl */

__kernel void device_func(__global char* buf)


	buf[0] = sizeof(char*);


Test Program Result:

Host Pointer Size  : 8 byte

  Device Pointer Size: 4 byte

same problem on GeForce GT 520…
Expected min alignment for buffers is 512 bytes…
Work group size 1024 .
Preferred work group size multipleis 32 .
OS : Windows7 64-bit

i installed last nvidia toolkit ( NVIDIA_Parallel_Nsight_Win64_2.2.0.12110.msi ).

i checked my link path “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\lib\x64”.

how can this be ?

any guru from Nvidia boys ?

this cannot be left unanswered if you guys realy want to make OpenCL an realy open cl.

any answer will be highly appreciated.

thank you.

I’d guess that if your kernels need no more than 4GB of memory, then the compiler might not shoot itself in the leg by making every pointer inside the kernel 64-bit, when there’s really no need to.

I’m sure this might cause problems when porting applications. However I do recall another problem, where people were arguing that 6GB of C2070 cannot be used, only 4GB… perhaps this might be the reason. Do search the forums for this topic.

Another guess, try incorporating double precision types into your kernel. Although it still wouldn’t require 64-bit pointers to be used on the device, it might trigger something. (Compilers are not almighty, and this might have been overlooked)

Don’t expect NVIDIA employees to post answers on the forum. I have never seen such an occasion.