linked lists in CUDA Creating linked lists with CUDA


Has anyone here worked with linked lists in CUDA…

If i want to port a CPU function that deals with linked lists what things should i change…

or can the linked list be simply copied to the Device memory and used in the function??

Can you provide suggestions here please…

It sucks. Don’t do it. You need to rethink your algorithm in a serious way.

That you’re asking this question suggests to me that you need to study a lot more CUDA before you’re ready to seriously write code. I’d re-read the programmer’s guide, specifically the sections on the memory hierarchy and the execution model, and probably listen to the UIUC lectures.

(a. linked lists would be very ugly to copy because of the excessive amounts of malloc/free unless you do something silly with fixed-length arrays per list
b. linked lists algorithms are generally far too serial to perform well on the GPU
c. uncoalesced accesses everywhere)

i agree…am at the beginning stage of learning CUDA…

Regarding b there is a paper by W. Daniel Hillis and Guy L. Steele, Jr. that claims otherwise. However they are talking about systems build out of cpus.

am trying to check if i can get that paper…i also came aacross this one……33;OpenDocument

Well, no one said that you can’t do a linked list in CUDA. It is C, so you can do whatever you want. Just don’t expect anyone on these forums to help you because of your insanity.

As a simple example of why linked lists are near impossible: consider this. I want thread [i]i[/b] to read element i from the linked list. This cannot be done in any sort of efficient way. Each thread must read the linked list in it’s entirety up to element i to get at it! That is serial programming. (this is an expansion on tmurray’s point b above).

I will go out on a limb and say that structures of arrays will always be better, easier to program, and faster than the equivalent data in a linked list of structures on the GPU.

For those interested, the paper can be found at…-algorithms.pdf
The scan algorithm looks pretty similar to the one in the examples as the bind one node to one element, no reading all over the place. However up till now I avoided linked lists where possible (except in a small peace of binning code).

Maybe I misunderstood what they did but: what that algorithm processes is not a linked list in a general sense (a few data elements located at some random places in a memory that is huge compared to the list size and you have no clue where the other elements are except the first), but they assume that you already have the location of all list elements. To be fair, it does work if you have many lists and only know the locations of the elements of all lists, not to which (if any) they belong - but while that is probably not hard to achieve it is still not a “normal” linked list.

I remember reading this paper before and being annoyed by it because they make it sound like they found a solution to list traversing being O(N) while IMO they just changed the problem to suit them (which really is ok, just doing it without clearly saying so is not).

:-) thanks for the scolding but you gave me the concept …so its nice to change the linked lists to arrays…will try