Here’s my problem, with which I’m stuck for the last 2 days : in short, I need to copy a large amount of data as an 1D array to the GPU memory, and from there I need to take chunks out of the array and put them in other smaller arrays, still on the GPU.
I’ve been trying to solve this with cudaMemcpyFrom/ToSymbol and although I don’t get any errors, I can’t modify the data with these functions.
as a small example of what I’m trying to do:
global void setZero(int* data, int length)
int poz = blockIdx.x * blockDim.x + threadIdx.x; if(poz >= length) return; data[poz] = 0;
global void addFive(int* data, int length)
int poz = blockIdx.x * blockDim.x + threadIdx.x; if(poz >= length) return; data[poz] += 5;
device int *beta, *gamma;
int main(int argc, char* argv)
// host data alpha = (int*)malloc(100 * sizeof(int));
// device data
cudaMalloc((void**)&beta, 100 * sizeof(int)); // device small data chunk cudaMalloc((void**)&gamma, 10 * sizeof(int)); cudaMemcpy(beta, alpha, 100 * sizeof(int), cudaMemcpyHostToDevice); setZero<<<2, 50>>>(beta, 100); addFive<<<2, 50>>>(beta, 100);
// ==== //
cudaMemcpyFromSymbol(gamma, "beta", 10*sizeof(int), 20, cudaMemcpyDeviceToDevice); addFive<<<2, 50>>>(gamma, 100); cudaMemcpyToSymbol("beta", gamma, 10*sizeof(int), 20, cudaMemcpyDeviceToDevice); // === // cudaMemcpy(alpha, beta, 100 * sizeof(int), cudaMemcpyDeviceToHost); for(int i = 0; i < 100; ++i) printf("%d = %d\n", i, alpha[i]); return 0;
the example is just to show that I’m trying to move part of the device array ‘beta’ into ‘gamma’, modify it there, and move it back with cudaMemcpyTo/FromSymbol. the code between the // ==== //'s doesn’t affect the data at all, I get at the end an array full of 5’s, nothing extra.
cudaGetErrorString(cudaGetLastError()) after each line of code only returns “no error”, and I do have to mention that I tried every example I found, with & and without, with constant, trying to copy the whole array or just the first position instead of a chunk from the middle, copying from host to device and back instead of device-device, etc. etc.
It seems something’s wrong with the whole scenario, and I can’t figure a way past this, since, these functions seem to be the only way to copy a part of an array into another inside the cuda machine.