cudaMemcpyFromSymbol painful problem

Hi everyone.

Here’s my problem, with which I’m stuck for the last 2 days External Image : in short, I need to copy a large amount of data as an 1D array to the GPU memory, and from there I need to take chunks out of the array and put them in other smaller arrays, still on the GPU.

I’ve been trying to solve this with cudaMemcpyFrom/ToSymbol and although I don’t get any errors, I can’t modify the data with these functions.

as a small example of what I’m trying to do:

[codebox]

global void setZero(int* data, int length)

{

int poz = blockIdx.x * blockDim.x + threadIdx.x;

if(poz >= length)

	return;

data[poz] = 0; 

}

global void addFive(int* data, int length)

{

int poz = blockIdx.x * blockDim.x + threadIdx.x;

if(poz >= length)

	return;

data[poz] += 5;

}

device int *beta, *gamma;

int *alpha;

int main(int argc, char* argv)

{

       // host data

        alpha = (int*)malloc(100 * sizeof(int));

// device data

cudaMalloc((void**)&beta, 100 * sizeof(int));

        // device small data chunk

cudaMalloc((void**)&gamma, 10 * sizeof(int));

cudaMemcpy(beta, alpha, 100 * sizeof(int), cudaMemcpyHostToDevice);

setZero<<<2, 50>>>(beta, 100);

addFive<<<2, 50>>>(beta, 100);

// ==== //

cudaMemcpyFromSymbol(gamma, "beta", 10*sizeof(int), 20, cudaMemcpyDeviceToDevice);

addFive<<<2, 50>>>(gamma, 100);

cudaMemcpyToSymbol("beta", gamma, 10*sizeof(int), 20, cudaMemcpyDeviceToDevice);

        // === //

cudaMemcpy(alpha, beta, 100 * sizeof(int), cudaMemcpyDeviceToHost);

for(int i = 0; i < 100; ++i)

	printf("%d = %d\n", i, alpha[i]);

return 0;

}

[/codebox]

the example is just to show that I’m trying to move part of the device array ‘beta’ into ‘gamma’, modify it there, and move it back with cudaMemcpyTo/FromSymbol. the code between the // ==== //'s doesn’t affect the data at all, I get at the end an array full of 5’s, nothing extra.

cudaGetErrorString(cudaGetLastError()) after each line of code only returns “no error”, and I do have to mention that I tried every example I found, with & and without, with constant, trying to copy the whole array or just the first position instead of a chunk from the middle, copying from host to device and back instead of device-device, etc. etc.

It seems something’s wrong with the whole scenario, and I can’t figure a way past this, since, these functions seem to be the only way to copy a part of an array into another inside the cuda machine.

The constant memory space is read-only. You can only modify the data host side.

so what else can I do in that case ? I don’t really need to modify the constant memory space device side, but I do need to use it as a buffer from where to get smaller pieces of arrays. it would be plain stupid to send the separate pieces directly from the host to the device mainly because they overlap heavily, and this is also why I need functions similar to cudaMemcpyTo/FromSymbol, since these can copy parts of an array.

Maybe I don’t understand exactly what you are trying to do, but why can’t you just use cudaMemcpy exclusively and forget about constant memory? It can copy smaller pieces of an array…

because I need to specify the offset from where to start copying, not just copy from position 0, and while in c/c++ I’d probably do something like memcpy(dest, source + offset, size), I understand that working directly with pointers on the cuda device is somewhere between “not recomended” and “plain forbidden”.

so that’s what I noticed between the normal cudaMemcpy and the cudaMemcpyTo/FromSymbol - being able to specify an offset for the copy.

in the specification it says that the ‘cudaMemcpyTo/FromSymbol’ functions work with the cudaMemcpyDeviceToDevice flag - and I used the functions from within the host code, not the device code, so technically, being the host I can modify the constant space any way I like - for example. I modify it in the first hand by copying from host to device.

You can do that. You can’t dereference or otherwise manipulate device pointers in host code, but you can calculate an offset in linear device memory from a device pointer. You just have to make sure you understand padding and alignment words in GPU storage, if they exist.

yepp