Hi!, this is my first post here on the forum. I am analyzing a program written in CUDA and I have a question about the usage of cudaMemcpy and cudaMemcpytoSymbol on it.
The problem consists on transferring some parameters needed to calculate random numbers using a Mersenne Twister random number generator for GPUs. The parameters are calculated on the host and stored on an array and then transferred to the GPU, I will not post the entire code because is huge, but only the transferring part:
// allocate device memory for tables
cudaMalloc(&d_MT_tables, size_tbl); ce(3002);
// copy tables to device
cudaMemcpy(d_MT_tables, h_MT_tables, size_tbl, cudaMemcpyHostToDevice); ce(3003);
cudaMemcpyToSymbol(MT_tables, &d_MT_tables, sizeof(MT_tables_t*)); ce(3004);
h_MT_tables is an array of a typedef struct MT_tables_t and its definition is:
typedef struct MT_tables_t {
uint recursion[MT_TABLE_SIZE];
uint tempering[MT_TABLE_SIZE];
} MT_tables_t;
uint size_tbl = num * sizeof(MT_tables_t);
MT_tables_t h_MT_tables = (MT_tables_t)malloc(size_tbl);
d_MT_tables and MT_tables are pointers and they are defined as:
MT_tables_t *d_MT_tables;
constant MT_tables_t *MT_tables;
Well, I can understand the first part of the transferring: the parameters stored on h_MT_tables are transferred to the GPU to the memory area pointed by d_MT_tables using cudaMemcpy. The amount of bytes transferred corresponds to the size of h_MT_tables, therefore the entire array is transferred.
With the second transfer using cudaMemcpytoSymbol I have questions. I have read that it must be used when one wants to transfer data to the constant memory space of the device, what it is seems to be the case here. But there is two problems, first only a small portion of the original data is transferred to MT_tables, because sizeof(MT_tables*) is by far smaller than size_tbl (I checked on my Visual Studio ) and second (what makes the things weird) d_MT_tables is NEVER used on the kernel … only MT_tables…
so the question arises, what it is intended by the programmer in this case? It seems weird that the entire h_MT_tables is transferred to the device, but then only a very small portion saved on MT_tables is actually used by the kernel. Thanks for your help.