Doubt about cudaMemcpy and cudaMemcpytoSymbol

Hi!, this is my first post here on the forum. I am analyzing a program written in CUDA and I have a question about the usage of cudaMemcpy and cudaMemcpytoSymbol on it.

The problem consists on transferring some parameters needed to calculate random numbers using a Mersenne Twister random number generator for GPUs. The parameters are calculated on the host and stored on an array and then transferred to the GPU, I will not post the entire code because is huge, but only the transferring part:

// allocate device memory for tables
cudaMalloc(&d_MT_tables, size_tbl); ce(3002);

// copy tables to device
cudaMemcpy(d_MT_tables, h_MT_tables, size_tbl, cudaMemcpyHostToDevice); ce(3003);
cudaMemcpyToSymbol(MT_tables, &d_MT_tables, sizeof(MT_tables_t*)); ce(3004);

h_MT_tables is an array of a typedef struct MT_tables_t and its definition is:

typedef struct MT_tables_t {
uint recursion[MT_TABLE_SIZE];
uint tempering[MT_TABLE_SIZE];
} MT_tables_t;

uint size_tbl = num * sizeof(MT_tables_t);
MT_tables_t h_MT_tables = (MT_tables_t)malloc(size_tbl);

d_MT_tables and MT_tables are pointers and they are defined as:
MT_tables_t *d_MT_tables;
constant MT_tables_t *MT_tables;

Well, I can understand the first part of the transferring: the parameters stored on h_MT_tables are transferred to the GPU to the memory area pointed by d_MT_tables using cudaMemcpy. The amount of bytes transferred corresponds to the size of h_MT_tables, therefore the entire array is transferred.

With the second transfer using cudaMemcpytoSymbol I have questions. I have read that it must be used when one wants to transfer data to the constant memory space of the device, what it is seems to be the case here. But there is two problems, first only a small portion of the original data is transferred to MT_tables, because sizeof(MT_tables*) is by far smaller than size_tbl (I checked on my Visual Studio ) and second (what makes the things weird) d_MT_tables is NEVER used on the kernel … only MT_tables…

so the question arises, what it is intended by the programmer in this case? It seems weird that the entire h_MT_tables is transferred to the device, but then only a very small portion saved on MT_tables is actually used by the kernel. Thanks for your help.

I am unsure why he is copying things he’s not using.

But I can tell you that doing this:


is most likely the source of your error. Think about what that statement means:

The size of the pointer itself, not the size of the buffer of memory it’s pointing to.

It will return either 4 bytes on a 32-bit build or 8 bytes on a 64-bit build.

It’s a common programming mistake independent of CUDA.

Thanks for the answer!

I found this thread in another forum and it seems quite clear what is intending the programer:

I extract the “interesting” part of the post:


To my knowledge you could also cudaMemcpyToSymbol a pointer to a pointer (unlike your example, where you’re copying an array to a pointer), but beware only that pointer will be constant, not the memory it’s pointing to on your device. If you were going to go this route, you would need to add a cudaMalloc, then cudaMemcpyToSymbol the resulting ptr to device memory to your constant device var. AGAIN, in this case the array values WILL NOT be constant – ONLY the pointer to the memory will be.

Your call for this case would be something like:

int * d_dic;
cudaMalloc((void *) &d_dic, num_codewords(bSizeY*bSizeX)*sizeof(int));
cudaMemcpyToSymbol(c_dic_ptr, &d_Dic, sizeof(int *));

I suppose that that would accelerate the access to the data on the device.