how to memcpy single pointer to double pointer?

cudaMancpy · August 7, 2019, 6:15am

hello.

I am confusing about memcpy.

unsigned long long int ** dev_double;
unsigned long long int * dev_single;

err = cudaMalloc(&dev_double, sizeof(unsigned long long int *) * level);
if (err != cudaSuccess) {
	fprintf(stderr, "Failed to allocate double (error code %s)!\n", cudaGetErrorString(err));
	exit(EXIT_FAILURE);
}

err = cudaMalloc(&dev_single, sizeof(unsigned long long int) * size);
if (err != cudaSuccess) {
	fprintf(stderr, "Failed to allocate single (error code %s)!\n", cudaGetErrorString(err));
	exit(EXIT_FAILURE);
}

for (size_t i = 0; i < level; i++)	{
	err = cudaMemcpy(dev_double[level], &dev_single[_size_array], sizeof(unsigned long long int *), cudaMemcpyDeviceToDevice);
	if (err != cudaSuccess) {
		fprintf(stderr, "Failed to memcpy dev_single to dev_double (error code %s)!\n", cudaGetErrorString(err));
		exit(EXIT_FAILURE);
	}

        _size_array += jump_space;
}

dev_single is a flat array and dev_double is index of parts of single pointer

I want to memcpy single pointer’s location to double pointer.
Is my code right…?
I found the double pointer manual, but most of codes use host memcpy to device.
I don’t need host memcpy, because I will memcpy other data structure which updated using double pointer.

nelion2 · August 9, 2019, 7:08pm

In theory your code looks fine to me. However I would still recommend executing the for loop as a kernel and not as host code.

err = cudaMemcpy(dev_double[level], &dev_single[_size_array], sizeof(unsigned long long int *), cudaMemcpyDeviceToDevice);

You are indexing a device array on the host. While technically this should result to the same pointer addresses, this is an assumption about the cuda implementation which should be avoided.

When executing the for loop as a kernel you are also probably faster because you can parallelize the pointer copying and don’t need a cudaMemcpy anymore.

cudaMancpy · August 12, 2019, 8:52am

this is right code:

unsigned long long int * temp_ptr = &dev_single[_size_array];
err = cudaMemcpy(&dev_double[level], &temp_ptr, sizeof(unsigned long long int *), cudaMemcpyHostToDevice);
if (err != cudaSuccess) {
	fprintf(stderr, "Failed to memcpy dev_single(error code %s)!\n", cudaGetErrorString(err));
	exit(EXIT_FAILURE);
}

so your opinion, recommend using the kernel to address setting, right?