Why is device pointer size 32-bit in x64?

Susie_Zero · February 17, 2012, 4:38am

Hello, all.

I am porting my program from CUDA to OpenCL.

And now I am troubled by the difference of pointer size between host and device.

In CUDA x64, the pointer size is same. (Host=Device=64-bit)

But, in OpenCL x64, the pointer size is different. (Host=64-bit, Device=32-bit)

I tested it using the following test program in OpenCL.

I will appreciate any help.

Thanks.

My Environment

OS : Windows7 64-bit

CPU : Intel Xeon E5620

GPU : NVIDIA Tesla C2050

SDK : NVIDIA GPU Computing SDK 4.0, Visual Studio2010 express

Test Programï¼š

/* host.c */

#include <stdio.h>

#include <stdlib.h>

#include <CL/cl.h>

#define BUF_SIZE (32)

#define MAX_SOURCE_SIZE (0x100000)

int main()

{

	cl_device_id device_id = NULL;

	cl_context context = NULL;

	cl_command_queue command_queue = NULL;

	cl_mem memobj = NULL;

	cl_program program = NULL;

	cl_kernel kernel = NULL;

	cl_platform_id platform_id = NULL;

	cl_uint ret_num_devices;

	cl_uint ret_num_platforms;

	cl_int ret;

	

	char buf[BUF_SIZE];

	

	FILE *fp;

	char fileName[] = "./device.cl";

	char *source_str;

	size_t source_size;

	/* Load Kernel Source Code */

	fp = fopen(fileName, "r");

	if (!fp)

	{

		fprintf(stderr, "Failed to load kernel.\n");

		exit(1);

	}

	source_str = (char*)malloc(MAX_SOURCE_SIZE);

	source_size = fread(source_str, 1, MAX_SOURCE_SIZE, fp);

	fclose(fp);

	

	ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);

	ret = clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, &ret_num_devices);

	context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &ret);

	command_queue = clCreateCommandQueue(context, device_id, 0, &ret);

	memobj = clCreateBuffer(context, CL_MEM_READ_WRITE, BUF_SIZE * sizeof(char), NULL, &ret);

	program = clCreateProgramWithSource(context, 1, (const char **)&source_str,

					(const size_t *)&source_size, &ret);

	ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);

	kernel = clCreateKernel(program, "device_func", &ret);

	ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&memobj);

	ret = clEnqueueTask(command_queue, kernel, 0, NULL, NULL);

	ret = clEnqueueReadBuffer(command_queue, memobj, CL_TRUE, 0,

				BUF_SIZE * sizeof(char), buf, 0, NULL, NULL);

	

	printf("Host Pointer Size  : %d byte\n", sizeof(char *));

	printf("Device Pointer Size: %d byte\n", buf[0]);

	ret = clFlush(command_queue);

	ret = clFinish(command_queue);

	ret = clReleaseKernel(kernel);

	ret = clReleaseProgram(program);

	ret = clReleaseMemObject(memobj);

	ret = clReleaseCommandQueue(command_queue);

	ret = clReleaseContext(context);

	free(source_str);

	return 0;

}

/* device.cl */

__kernel void device_func(__global char* buf)

{

	buf[0] = sizeof(char*);

}

Test Program Result:

Host Pointer Size  : 8 byte

  Device Pointer Size: 4 byte

cgun · April 27, 2012, 10:32am

same problem on GeForce GT 520…
Expected min alignment for buffers is 512 bytes…
Work group size 1024 .
Preferred work group size multipleis 32 .
OS : Windows7 64-bit

i installed last nvidia toolkit ( NVIDIA_Parallel_Nsight_Win64_2.2.0.12110.msi ).

i checked my link path “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\lib\x64”.

how can this be ?

any guru from Nvidia boys ?

this cannot be left unanswered if you guys realy want to make OpenCL an realy open cl.

any answer will be highly appreciated.

thank you.

Meteorhead · April 27, 2012, 11:38am

I’d guess that if your kernels need no more than 4GB of memory, then the compiler might not shoot itself in the leg by making every pointer inside the kernel 64-bit, when there’s really no need to.

I’m sure this might cause problems when porting applications. However I do recall another problem, where people were arguing that 6GB of C2070 cannot be used, only 4GB… perhaps this might be the reason. Do search the forums for this topic.

Another guess, try incorporating double precision types into your kernel. Although it still wouldn’t require 64-bit pointers to be used on the device, it might trigger something. (Compilers are not almighty, and this might have been overlooked)

Don’t expect NVIDIA employees to post answers on the forum. I have never seen such an occasion.

Topic		Replies	Views
32b / 64b question - CUdeviceptr size CUDA Programming and Performance	5	19016	February 5, 2009
32-bit nvcc makes faster GPU code than 64-bit variant In CUDA version 2.1 CUDA Programming and Performance	9	10592	February 14, 2009
[bugreport] __alignof(CUdeviceptr) == 4, should 8 CUDA Programming and Performance	12	27434	July 5, 2010
Memory or pointer size too big to fit in 32Btis Cuda error in cudaMemcpy() CUDA Programming and Performance	4	1141	September 15, 2010
Pointer Size:32 on 64bit? CUDA Programming and Performance	1	4333	April 30, 2011
Porting from CUDA to OpenCL: Pointer problem program terminates with unhandled exception CUDA Programming and Performance	0	6500	June 8, 2010
Pointer size in GeForce GTX 580 (cc 2.0) 32 or 64 bits? CUDA Programming and Performance	3	893	June 8, 2012
cuda_sizeof() host-side calculation of device sizeof CUDA Programming and Performance	2	3986	August 13, 2008
64-bit versus 32-bit CUDA code Any benefit at all? CUDA Programming and Performance	5	13045	November 3, 2009
OpenCL. Incorrect global memory size on 64-bit GPU CUDA Programming and Performance	2	1059	March 22, 2016

Why is device pointer size 32-bit in x64?

Related topics