Hi everyone.
Here’s my problem, with which I’m stuck for the last 2 days External Image : in short, I need to copy a large amount of data as an 1D array to the GPU memory, and from there I need to take chunks out of the array and put them in other smaller arrays, still on the GPU.
I’ve been trying to solve this with cudaMemcpyFrom/ToSymbol and although I don’t get any errors, I can’t modify the data with these functions.
as a small example of what I’m trying to do:
[codebox]
global void setZero(int* data, int length)
{
int poz = blockIdx.x * blockDim.x + threadIdx.x;
if(poz >= length)
return;
data[poz] = 0;
}
global void addFive(int* data, int length)
{
int poz = blockIdx.x * blockDim.x + threadIdx.x;
if(poz >= length)
return;
data[poz] += 5;
}
device int *beta, *gamma;
int *alpha;
int main(int argc, char* argv)
{
// host data
alpha = (int*)malloc(100 * sizeof(int));
// device data
cudaMalloc((void**)&beta, 100 * sizeof(int));
// device small data chunk
cudaMalloc((void**)&gamma, 10 * sizeof(int));
cudaMemcpy(beta, alpha, 100 * sizeof(int), cudaMemcpyHostToDevice);
setZero<<<2, 50>>>(beta, 100);
addFive<<<2, 50>>>(beta, 100);
// ==== //
cudaMemcpyFromSymbol(gamma, "beta", 10*sizeof(int), 20, cudaMemcpyDeviceToDevice);
addFive<<<2, 50>>>(gamma, 100);
cudaMemcpyToSymbol("beta", gamma, 10*sizeof(int), 20, cudaMemcpyDeviceToDevice);
// === //
cudaMemcpy(alpha, beta, 100 * sizeof(int), cudaMemcpyDeviceToHost);
for(int i = 0; i < 100; ++i)
printf("%d = %d\n", i, alpha[i]);
return 0;
}
[/codebox]
the example is just to show that I’m trying to move part of the device array ‘beta’ into ‘gamma’, modify it there, and move it back with cudaMemcpyTo/FromSymbol. the code between the // ==== //'s doesn’t affect the data at all, I get at the end an array full of 5’s, nothing extra.
cudaGetErrorString(cudaGetLastError()) after each line of code only returns “no error”, and I do have to mention that I tried every example I found, with & and without, with constant, trying to copy the whole array or just the first position instead of a chunk from the middle, copying from host to device and back instead of device-device, etc. etc.
It seems something’s wrong with the whole scenario, and I can’t figure a way past this, since, these functions seem to be the only way to copy a part of an array into another inside the cuda machine.