Linked List on CUDA

Has anybody in here worked with linked lists in CUDA? A previous thread that I found wasn’t so helpful.

My problem is not so complicated.

Imagine a structure like this one:

struct box


int x;

int y;

float* data;


and let’s say we have a pointer like this:

struct box *boxes;

Size of data and boxes are known.

How is it possible to transfer the *boxes data from host to device?

I would guess that you would have to condense all your structures into an array first before copying them as one block to the GPU. This is because the overhead of many small host to GPU copy operations would be prohibitive, considering that your list may have thousands of elements…

Alternatively you could try to make sure that all your structures and data resides in a particular memory region by using memory pooling. So instead of malloc() or new use a pooled implementation where you control the memory region that gets used. Then you can just copy the entire region. But you would have to transpose your pointers for them to be usable on the GPU. How to do that efficiently, I don’t know. Maybe passing a pointer offset to the GPU kernel might work, then the pointer arithmetic happens in your kernel. But if the host CPU uses 64 bit pointers and the GPU uses 32 bit pointers, you’re screwed ;)

UPDATE: I was assuming your structures are linked in the form of a singly linked list, but apparently I was mistaken (where is the next pointer?). Anyway, pooling might work in your case too if you place your struct box and the data in the same pooled memory region.

Thanks for your answer mate!
Yes, it’s not really a linked list. I don’t know how to describe it.

That’s a really nice idea, I’ll look over it…

Anyway, I just decided to split the data and copy them into two big buffers!
Memory pooling caused me a headache…