A simple memcpy implementation in openacc


I have a simple memcpy implementation. The program compiles however doesn’t run due to data movement. Here is the code:

// memorycopy.cpp : A simple memcpy implementation

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

struct A
	char name[40];
	int age;
} person, person_copy;

template<typename T> void mymemcpy(T dest, T src, size_t src_size)
#pragma acc kernels loop independent pcopyin(src[0:src_size]) pcopyout(dest[0:src_size])
	for (size_t i = 0; i < src_size; i++)
		*(dest+i) = *(src+i);
		//dest[i] = src[i];

//void mymemcpy(A*  dest, A*  src, size_t src_size)
//#pragma acc kernels loop independent copyin(src[0:src_size]) copy(dest[0:src_size])
//	for (size_t i = 0; i < src_size; i++)
//	{
//		//*(dest+i) = *(src+i);
//		dest[i] = src[i];
//	}

int main(int argc, char* argv[])
	char myname[] = "Pierre de Fermat";

	//using memcpy to copy string
	mymemcpy(person.name, myname, strlen(myname) + 1);
	person.age = 46;

	mymemcpy(&person_copy, &person, sizeof(person));
	printf("mymemcpy->person_copy:%s, %d\n", person_copy.name, person_copy.age);

	return 0;

I get the following error when I run the code with pgcpp 15.3 compiler on Windows:

FATAL ERROR: variable in data clause is partially present on the device: name=_2
 file:D:\Projects\HelloWorld\memorycopy\memorycopy.cpp mymemcpy__tm__4_P1A__FZ1Z
T1UL_v line:16
_22666_38_dest lives at 0000000140087260 size 1936 partially present
Present table dump for device[1]: NVIDIA Tesla GPU 1, compute capability 3.0
host:0000000140087220 device:0000000B01140000 size:1936 presentcount:1 line:16 n

Do you have any idea?

Best Regards,

Hi hyuzuguzel,

In looking at the compiler feedback messages, your second call to mymemcpy gets the type of “A*”. Hence “src” and “dest” are pointers to an “A” struct. When you copy them in you are copying them as a 44 element array of A structs, not 44 bytes.

void mymemcpy<A *>(T1, T1, unsigned long):
     16, Generating copyin(src[:src_size])
         Generating copyout(dest[:src_size])

To fix don’t use a template, rather explicitly pass them in as “char *”.

Note that there is a device memcpy routine in the OpenACC 2.5 standard. While the standard isn’t out yet, we have it implemented as of our 15.1 release.

acc_memcpy_device(dest device ptr, src device ptr, size in bytes);

PGI also added “acc_memcpy”. The difference is that it takes the host pointers while “acc_memcpy_device” expect device pointers. In both cases, it’s the device memory that gets copied. Host memory is untouched. Also, the dest and src variables need to have device copies so should be enclosed in a data region.

acc_memcpy(dest,src, size in byte)

Hope this helps,