Loading memory that was allocated in kernel.

TheDonsky · February 3, 2016, 9:55am

Hello,
I have a std::vector-like class(but limited to structures, that can be allocated using simple malloc), that can upload a copy of itself to CUDA device and download the data back after any modification(on demand). My problem is, that it sometimes needs to reallocate an array and in that case cudaMemcpy fails to download the data back.

here’s the implementation of both functions:

template<typename Type, unsigned int sizeOnStack>
// Adds an element to the CuVector
__device__ __host__ inline Type& CuVector<Type, sizeOnStack>::push(const Type& value){
	if (usedSize >= allocSize){
		allocSize *= 2;
		Type *newData = new Type[allocSize];
		if (newData == NULL) return(data[usedSize - 1]);
		for (int i = 0; i < usedSize; i++)
			newData[i] = data[i];
		if ((data != stackData) && data != NULL) delete[] data;
		data = newData;
		printf("allocated\n");// just to ensure, the kernel reached this line.
	}
	data[usedSize] = value;
	usedSize++;
	return(data[usedSize - 1]);
}

template<typename Type, unsigned int sizeOnStack>
// Loads data from the GPU clone (Returns true if successful)
inline bool CuVector<Type, sizeOnStack>::updateFromClone(){
	if (clone == NULL) return(false);
	char cln;
	CuVector *cpuClone = (CuVector*)cln;
	if (cudaMemcpy(cpuClone, clone, sizeof(CuVector<Type, sizeOnStack>), cudaMemcpyDeviceToHost) != cudaSuccess){
		std::cout << "Failed to load clone..." << std::endl;
		return(false);
	}
	Type *newData = NULL;
	if (cpuClone->data != clone->stackData){
		newData = new Type[cpuClone->allocSize];
		if (newData == NULL) return(false);
		if (cudaMemcpy(newData, cpuClone->data, sizeof(Type)*cpuClone->allocSize, cudaMemcpyDeviceToHost) != cudaSuccess){
			std::cout << "Failed to load clone data..." << std::endl; // I see this massage if and only if the upper function reallocates data
			std::cout << cudaGetErrorString(cudaGetLastError()) << std::endl;
			return(false);
		}
	}
	if ((data != stackData) && data != NULL) delete[] data;
	usedSize = cpuClone->usedSize;
	allocSize = cpuClone->allocSize;
	if (newData == NULL){
		data = stackData;
		for (int i = 0; i < usedSize; i++)
			data[i] = cpuClone->stackData[i];
	}
	else data = newData;
	return(true);
}

What might be the reason?
Do kernel “new” and host “cudaMalloc” operate on different parts of the GPU memory?

TheDonsky · February 3, 2016, 12:44pm

Well, I looked up a little bit and found out, that I simply wasn’t allowed to access something allocated in kernel using host APIs.
Found another solution thought… And it works just fine.

Topic		Replies	Views
cudaMemcpy to device allocated memory (via malloc) fails with CUDA Programming and Performance	1	562	June 25, 2021
''cudaMemcpy'' failed to copy from device memory dynamically allocate using ''malloc'' CUDA Programming and Performance	5	460	October 25, 2022
Accessing GPU global memory allocated on device - by host CUDA Programming and Performance	3	1192	June 3, 2013
cudaErrorInvalidValue when copying from host to device CUDA Programming and Performance	1	9813	November 24, 2009
Dynamic Memory Allocation on the Host CUDA Programming and Performance	6	1887	May 26, 2010
Device to host data copy may not reflect on host side using graphs CUDA Programming and Performance	5	234	September 6, 2023
How to implement a generic array in unified memory? CUDA Programming and Performance	7	2027	July 2, 2017
Unable to copyMem back to host Copy filled matrix back to host CUDA Programming and Performance	2	575	July 13, 2011
Segmentation fault when calling virtual function on host CUDA Programming and Performance	9	2453	September 10, 2019
cudaHostRegister breaks data CUDA Programming and Performance	0	435	April 8, 2019

Loading memory that was allocated in kernel.

Related topics