how to change from cudaMallocManaged to cudaMalloc

hello.
I am wondering about allocation pointers.
I already read a few allocation pointer issue things.
1.
https://stackoverflow.com/questions/30082991/memory-allocation-on-gpu-for-dynamic-array-of-structs
2.
https://stackoverflow.com/questions/16539085/cuda-dynamic-array-array-malloc-and-copy/16545272#16545272
3.
https://stackoverflow.com/questions/23609770/cuda-double-pointer-memory-copy

But these are not same with my code issue.
Because my pointers point same structure.

typedef struct vertex {
	unsigned int vertex_id;
	float pagerank;
	float pagerank_next;
	unsigned int n_successors;
	vertex **successors;
};

For example, my structure name is vertex.
Firstly, allocation vertex structure to the number of nodes (If there are 100 nodes, vertex structure allocates 100).
Secondly, allocation double pointer (these pointers point vertex structure which allocated in first step).

This structure means adjacency list. I understand data structure.
But when I change the code there is an error… (allocation issue)
the original code uses unified memory allocation. but I want to use cudaMalloc.
Please help me.

I think it would be simpler if you would have a list of all vertices and save the index of the successor in this vector in your adjacency list.

something like

typedef struct vertex {
    	unsigned int vertex_id;
    	float pagerank;
    	float pagerank_next;
    	unsigned int n_successors;
    	int* successors; //save positions of successors in all_vertices
    };

std::vector<vertex> all_vertices;
//... create the N vertices and save in all_vertices
std::vector<vertex> tmp = all_vertices;

vertex* all_vertices_device;

cudaMalloc(&all_vertices_device, sizeof(vertex) * N);

for(int i = 0; i < N; i++){
   cudaMalloc(&(tmp[i].successors), sizeof(int) * tmp[i].n_successors);
   cudaMemcpy(tmp[i].successors, all_vertices[i].successors, sizeof(int) * tmp[i].n_successors, hostToDevice);
}
cudaMemcpy(all_vertices_device, tmp.data(), sizeof(vertex) * N, hostToDevice);

Hello, Mr. striker159
Thanks for your helping.

show this code.

this is the PageRank code using linked list for adjacency list.
source node links to destination nodes. The link is pointer which points same structure (i.e., structure vertex)

So your suggestion is good but not working…T_T
If you know that how to allocate the vertex **successors in CUDA, please let me know.
I understand that code how to allocate vertex **successors in C but I don’t know how to allocate the vertex **successors in CUDA.
That code uses the cudaMallocManaged but I want to use just cudaMalloc.
Thanks for your kind. Thank you very much!

You could try to find the offset between host vertex array and device vertex array, then calculate the new pointer value in the adjacency list manually. Like in the following code, but this is untested. I have no idea if it actually works.

typedef struct vertex {
    	unsigned int vertex_id;
    	float pagerank;
    	float pagerank_next;
    	unsigned int n_successors;
    	vertex **successors;
    };

    std::vector<vertex> all_vertices;
    //... create the N vertices and save in all_vertices
    std::vector<vertex> tmp = all_vertices;

    vertex* all_vertices_device;

    cudaMalloc(&all_vertices_device, sizeof(vertex) * N);

    // calculate pointer difference between host array and device array
    size_t pointeroffset = all_vertices.data() - all_vertices_device;

    for(int i = 0; i < N; i++){
        unsigned int n_suc = tmp[i].n_successors;
       //make copy of adjacency list of vertex i
       std::unique_ptr<vertex*[]> adj_copy = std::make_unique<vertex*[]>(n_suc);

       //find pointers in device adjancency list by subtracting the pointeroffset
       for(int j = 0; j < n_suc; j++)
           adj_copy[j] = all_vertices[i].successors[j] - pointeroffset;

       //allocate device adjacency list
       cudaMalloc(&(tmp[i].successors), sizeof(vertex*) * n_suc);
       //copy adj_copy (which contains the device pointers for the adjacency list) to device
       cudaMemcpy(tmp[i].successors, adj_copy.get(), sizeof(vertex*) * n_suc, cudaMemcpyHostToDevice);
    }
    cudaMemcpy(all_vertices_device, tmp.data(), sizeof(vertex) * N, hostToDevice);

std::unique_ptr<vertex*> adj_copy = std::make_unique<vertex*>(n_suc);

There is a syntax error in this line.
What are the unique_ptr and make_unique?
I haven’t seen vector code like that.

And one more question.
When finished kernel, how to copy from device to host in your suggestion?

it is just a buffer (see http://en.cppreference.com/w/cpp/memory/unique_ptr). You could use std::vector<vertex*> instead, and in line 32 change adj_copy.get() to adj_copy.data()

To copy from device to host, you do essentially the same. copy everything into a temporary vertex array, and copy all data members except the adjacency list (I assume, the list is not modified on the GPU) to the original vertex array.