cudaMalloc issue

Hello.

I am wondering about cudaMalloc issue…
There is a structure and structure consists of 24 bytes (int, float, int, float, pointer).
Actually, I already asked cudaMalloc question but I didn’t understand myself… so I need some help.

This structure is used in PageRank algorithm.
Now, the structure is called vertex.
Firstly, structure is allocated number of nodes (vertices).
Secondly, succesors is allocated number of adjacency nodes (this is just pointer for approaching to adjacency nodes).

So, I understand about C language Malloc. But I don’t know how to allocate in CUDA…
I was troubled for three days… Please help me. It’s really hard…

-------------------------------- structure

struct vertex {
        unsigned int vertex_id;
	float pagerank;
	float pagerank_next;
	unsigned int n_successors;
	vertex ** successors;
};

-------------------------------- allocation in C

vertices = (vertex *)malloc(n_vertices*sizeof(vertex));

for (i = 0; i<n_vertices; i++) {
		if (vertices[i].n_successors > 0) {
			vertices[i].successors = (vertex **)malloc(vertices[i].n_successors*sizeof(vertex *));
			
			if (!vertices[i].successors) {
				fprintf(stderr, "Malloc failed for successors of vertex %d.\n", i);
				exit(-1);
			}
			memset((void *)vertices[i].successors, 0, (size_t)(vertices[i].n_successors * sizeof(vertex *)));
		}
		else
			vertices[i].successors = NULL;
}

-------------------------------- befor allocation in CUDA, data memcpy

while (fscanf(fp, "%d %d", &vertex_from, &vertex_to) != EOF) {
	for (i = 0; i < vertices[vertex_from].n_successors; i++) {
		if (vertices[vertex_from].successors[i] == NULL) {
			vertices[vertex_from].successors[i] = &vertices[vertex_to];
			break;
		}
		else if (i == vertices[vertex_from].n_successors - 1) {
			printf("Setting up the successor pointers of virtex %u failed", vertex_from);
			return -1;
		}
	}
}

-------------------------------- allocation in CUDA

for (i = 0; i<n_vertices; i++) {
	if (vertices[i].n_successors > 0) {
		vertex ** d_testVar;
		cudaMalloc(&d_testVar, vertices[i].n_successors * sizeof(vertex*));
		
		if (!vertices[i].successors) {
			fprintf(stderr, "Malloc failed for successors of vertex %d.\n", i);
			exit(-1);
		}

		cudaMemcpy(&d_testVar, &vertices[i].successors, vertices[i].n_successors * sizeof(vertex*), cudaMemcpyHostToDevice);
		cudaMemcpy(&(d_vertices[i].successors), &d_testVar, sizeof(vertex**), cudaMemcpyDeviceToDevice);
	}
	else
		cudaMemcpy(&(d_vertices[i].successors), &(vertices[i].successors), sizeof(vertex**), cudaMemcpyHostToDevice);
}

are you doing proper CUDA error checking? It doesn’t look like it.

sure. this is error check code.

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort = true)
{
	if (code != cudaSuccess)
	{
		fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
		if (abort) exit(code);
	}
}

the message is “invalid argument”
in here, cudaMemcpy(&(d_vertices[i].successors), &d_testVar, sizeof(vertex**), cudaMemcpyDeviceToDevice);

that’s because &d_testVar is a host address, therefore the correct copy direction is cudaMemcpyHostToDevice.

You have a much bigger issue on the previous line. It has to do with fixup/conversion of host pointers to device pointers:

cudaMemcpy(&d_testVar, &vertices[i].successors, vertices[i].n_successors * sizeof(vertex*), cudaMemcpyHostToDevice);

The &d_testVar is incorrect there, it should just be d_testVar (and you should have gotten an error about that as well.)

But the bigger issue is that the pointers you are copying are presumably pointers to host addresses, and they will be useless on the device/in device code. They need to be converted.

That’s as far as I’ll go without a complete test code to look at. The complexity here is what causes many people to think about alternate data organization schemes.

Thanks a lot!
I have found the right code!