Doubt about cudaMemcpy and cudaMemcpytoSymbol

EnDoerner · December 23, 2012, 2:57am

Hi!, this is my first post here on the forum. I am analyzing a program written in CUDA and I have a question about the usage of cudaMemcpy and cudaMemcpytoSymbol on it.

The problem consists on transferring some parameters needed to calculate random numbers using a Mersenne Twister random number generator for GPUs. The parameters are calculated on the host and stored on an array and then transferred to the GPU, I will not post the entire code because is huge, but only the transferring part:

// allocate device memory for tables
cudaMalloc(&d_MT_tables, size_tbl); ce(3002);

// copy tables to device
cudaMemcpy(d_MT_tables, h_MT_tables, size_tbl, cudaMemcpyHostToDevice); ce(3003);
cudaMemcpyToSymbol(MT_tables, &d_MT_tables, sizeof(MT_tables_t*)); ce(3004);

h_MT_tables is an array of a typedef struct MT_tables_t and its definition is:

typedef struct MT_tables_t {
uint recursion[MT_TABLE_SIZE];
uint tempering[MT_TABLE_SIZE];
} MT_tables_t;

uint size_tbl = num * sizeof(MT_tables_t);
MT_tables_t h_MT_tables = (MT_tables_t)malloc(size_tbl);

d_MT_tables and MT_tables are pointers and they are defined as:
MT_tables_t *d_MT_tables;
constant MT_tables_t *MT_tables;

Well, I can understand the first part of the transferring: the parameters stored on h_MT_tables are transferred to the GPU to the memory area pointed by d_MT_tables using cudaMemcpy. The amount of bytes transferred corresponds to the size of h_MT_tables, therefore the entire array is transferred.

With the second transfer using cudaMemcpytoSymbol I have questions. I have read that it must be used when one wants to transfer data to the constant memory space of the device, what it is seems to be the case here. But there is two problems, first only a small portion of the original data is transferred to MT_tables, because sizeof(MT_tables*) is by far smaller than size_tbl (I checked on my Visual Studio ) and second (what makes the things weird) d_MT_tables is NEVER used on the kernel … only MT_tables…

so the question arises, what it is intended by the programmer in this case? It seems weird that the entire h_MT_tables is transferred to the device, but then only a very small portion saved on MT_tables is actually used by the kernel. Thanks for your help.

DariusUndertow · December 24, 2012, 9:56pm

I am unsure why he is copying things he’s not using.

But I can tell you that doing this:

sizeof(MT_tables_t*)

is most likely the source of your error. Think about what that statement means:

The size of the pointer itself, not the size of the buffer of memory it’s pointing to.

It will return either 4 bytes on a 32-bit build or 8 bytes on a 64-bit build.

It’s a common programming mistake independent of CUDA.

EnDoerner · December 26, 2012, 4:49pm

Thanks for the answer!

I found this thread in another forum and it seems quite clear what is intending the programer:

http://stackoverflow.com/questions/9676441/cudas-cudamemcpytosymbol-throws-invalid-argument-error

I extract the “interesting” part of the post:

“”

To my knowledge you could also cudaMemcpyToSymbol a pointer to a pointer (unlike your example, where you’re copying an array to a pointer), but beware only that pointer will be constant, not the memory it’s pointing to on your device. If you were going to go this route, you would need to add a cudaMalloc, then cudaMemcpyToSymbol the resulting ptr to device memory to your constant device var. AGAIN, in this case the array values WILL NOT be constant – ONLY the pointer to the memory will be.

Your call for this case would be something like:

int * d_dic;
cudaMalloc((void *) &d_dic, num_codewords(bSizeY*bSizeX)*sizeof(int));
cudaMemcpyToSymbol(c_dic_ptr, &d_Dic, sizeof(int *));

“”
I suppose that that would accelerate the access to the data on the device.

Topic		Replies	Views
cudaMalloc and cudaMemcpy for pointer in struct CUDA Programming and Performance	7	3248	November 26, 2014
cudaMemcpyFromSymbol painful problem CUDA Programming and Performance	6	3356	December 21, 2009
GPU Transfer problems GPU won't correctly read data out from Device to Host CUDA Programming and Performance	15	2634	August 2, 2010
__constant__ Strangeness CUDA Programming and Performance	5	2487	July 21, 2008
Global arrays? CUDA Programming and Performance	24	10628	August 18, 2010
cudaMemcpy error, all data not being transferred. but cudaMemcpy returns cudaSuccess CUDA Programming and Performance	5	15227	August 7, 2019
Memory or pointer size too big to fit in 32Btis Cuda error in cudaMemcpy() CUDA Programming and Performance	4	1067	September 15, 2010
How to pass out the pointer which is the address of data on GPU remain the data on GPU, using the p CUDA Programming and Performance	5	5660	December 23, 2011
Question about CUDA_SAFE_CALL(cudaMemcpy(hostPx, CUDA_SAFE_CALL(cudaMemcpy(hostPx, device CUDA Programming and Performance	6	47470	January 23, 2009
__constant__ and cudaGetSymbolAddress CUDA Programming and Performance	4	6818	August 31, 2009

Doubt about cudaMemcpy and cudaMemcpytoSymbol

Related topics