how to change from cudaMallocManaged to cudaMalloc

cudaMancpy · January 24, 2018, 7:49am

hello.
I am wondering about allocation pointers.
I already read a few allocation pointer issue things.
1.
https://stackoverflow.com/questions/30082991/memory-allocation-on-gpu-for-dynamic-array-of-structs
2.
https://stackoverflow.com/questions/16539085/cuda-dynamic-array-array-malloc-and-copy/16545272#16545272
3.
https://stackoverflow.com/questions/23609770/cuda-double-pointer-memory-copy

But these are not same with my code issue.
Because my pointers point same structure.

typedef struct vertex {
	unsigned int vertex_id;
	float pagerank;
	float pagerank_next;
	unsigned int n_successors;
	vertex **successors;
};

For example, my structure name is vertex.
Firstly, allocation vertex structure to the number of nodes (If there are 100 nodes, vertex structure allocates 100).
Secondly, allocation double pointer (these pointers point vertex structure which allocated in first step).

This structure means adjacency list. I understand data structure.
But when I change the code there is an error… (allocation issue)
the original code uses unified memory allocation. but I want to use cudaMalloc.
Please help me.

striker159 · January 24, 2018, 12:15pm

I think it would be simpler if you would have a list of all vertices and save the index of the successor in this vector in your adjacency list.

something like

typedef struct vertex {
    	unsigned int vertex_id;
    	float pagerank;
    	float pagerank_next;
    	unsigned int n_successors;
    	int* successors; //save positions of successors in all_vertices
    };

std::vector<vertex> all_vertices;
//... create the N vertices and save in all_vertices
std::vector<vertex> tmp = all_vertices;

vertex* all_vertices_device;

cudaMalloc(&all_vertices_device, sizeof(vertex) * N);

for(int i = 0; i < N; i++){
   cudaMalloc(&(tmp[i].successors), sizeof(int) * tmp[i].n_successors);
   cudaMemcpy(tmp[i].successors, all_vertices[i].successors, sizeof(int) * tmp[i].n_successors, hostToDevice);
}
cudaMemcpy(all_vertices_device, tmp.data(), sizeof(vertex) * N, hostToDevice);

cudaMancpy · January 24, 2018, 1:51pm

Hello, Mr. striker159
Thanks for your helping.

show this code.

github.com

Apple-Of-Eden/GPU-pagerank-linked-list/blob/master/pagerank_linked.cu

#include <stdio.h>

// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>


typedef struct vertex vertex;

struct vertex {
    unsigned int vertex_id;
    float pagerank;
    float pagerank_next;
    unsigned int n_successors;
    vertex ** successors;
};

float abs_float(float in) {
  if (in >= 0)
    return in;
  else

This file has been truncated. show original

this is the PageRank code using linked list for adjacency list.
source node links to destination nodes. The link is pointer which points same structure (i.e., structure vertex)

So your suggestion is good but not working…T_T
If you know that how to allocate the vertex **successors in CUDA, please let me know.
I understand that code how to allocate vertex **successors in C but I don’t know how to allocate the vertex **successors in CUDA.
That code uses the cudaMallocManaged but I want to use just cudaMalloc.
Thanks for your kind. Thank you very much!

striker159 · January 24, 2018, 8:17pm

You could try to find the offset between host vertex array and device vertex array, then calculate the new pointer value in the adjacency list manually. Like in the following code, but this is untested. I have no idea if it actually works.

typedef struct vertex {
    	unsigned int vertex_id;
    	float pagerank;
    	float pagerank_next;
    	unsigned int n_successors;
    	vertex **successors;
    };

    std::vector<vertex> all_vertices;
    //... create the N vertices and save in all_vertices
    std::vector<vertex> tmp = all_vertices;

    vertex* all_vertices_device;

    cudaMalloc(&all_vertices_device, sizeof(vertex) * N);

    // calculate pointer difference between host array and device array
    size_t pointeroffset = all_vertices.data() - all_vertices_device;

    for(int i = 0; i < N; i++){
        unsigned int n_suc = tmp[i].n_successors;
       //make copy of adjacency list of vertex i
       std::unique_ptr<vertex*[]> adj_copy = std::make_unique<vertex*[]>(n_suc);

       //find pointers in device adjancency list by subtracting the pointeroffset
       for(int j = 0; j < n_suc; j++)
           adj_copy[j] = all_vertices[i].successors[j] - pointeroffset;

       //allocate device adjacency list
       cudaMalloc(&(tmp[i].successors), sizeof(vertex*) * n_suc);
       //copy adj_copy (which contains the device pointers for the adjacency list) to device
       cudaMemcpy(tmp[i].successors, adj_copy.get(), sizeof(vertex*) * n_suc, cudaMemcpyHostToDevice);
    }
    cudaMemcpy(all_vertices_device, tmp.data(), sizeof(vertex) * N, hostToDevice);

cudaMancpy · January 25, 2018, 4:35am

std::unique_ptr<vertex*> adj_copy = std::make_unique<vertex*>(n_suc);

There is a syntax error in this line.
What are the unique_ptr and make_unique?
I haven’t seen vector code like that.

And one more question.
When finished kernel, how to copy from device to host in your suggestion?

striker159 · January 25, 2018, 8:44am

it is just a buffer (see std::unique_ptr - cppreference.com). You could use std::vector<vertex*> instead, and in line 32 change adj_copy.get() to adj_copy.data()

To copy from device to host, you do essentially the same. copy everything into a temporary vertex array, and copy all data members except the adjacency list (I assume, the list is not modified on the GPU) to the original vertex array.

Topic		Replies	Views
cudaMalloc issue CUDA Programming and Performance	4	945	January 23, 2018
How to allocation in cudaMalloc...? Please help me... CUDA Programming and Performance	7	824	January 15, 2018
singly linked list CUDA Programming and Performance	4	1722	November 3, 2009
cudaMalloc and cudaMemcpy for pointer in struct CUDA Programming and Performance	7	3468	November 26, 2014
First cudaMalloc in program failing CUDA Programming and Performance	3	3854	November 7, 2011
allocating double pointer memory in GPU CUDA Programming and Performance	3	11923	February 3, 2011
double pointer allocation CUDA Programming and Performance	4	12467	September 16, 2013
Error allocating memory for custom struct CUDA Programming and Performance cuda , parallel-computing	1	297	March 7, 2024
Problem With GPU Memory CUDA Programming and Performance	4	3077	May 29, 2009
Using cudaMallocManaged to resize a vector on the device CUDA Programming and Performance	1	375	December 19, 2023

how to change from cudaMallocManaged to cudaMalloc

Related topics