Priority Queues!

Is it possible to use a priority queue inside a kernel? I’ve been trying for a while but can’t figure it out! Thanks all :)

Per thread? Sure, just just a local array.

Per block? Sure, use a shared memory array, and surround each access with __syncthreads() and use just one thread to access the state at a time.

Per kernel? You’re playing with ugliness. Yes, it’s possible but it’s inefficient and ugly and totally not recommended. The basic trick is to implement a locking mechanism (usually with atomicExch()) then the winner of the lock fight can update the queue the release the lock. But please don’t do this, it’s painful, it’s ugly, it’s loaded with pitfalls. Just pretend it’s not possible.

Yeah I meant per kernel. I guess I wont be doing that then :P. Thanks for the help though :D

Depending on your problem, you could make a batch queue, likely managed by the CPU. The basic idea is to pop the top 200 or so nodes from your priority queue, assign each node to one block of a kernel launch. They’re done in parallel. When the kernel is done, the CPU fires off the next batch of 200.
If the blocks are re-pushing new nodes onto the queue, those might be able to be added by the CPU as the next batch is cooking on the GPU.

Note this assumes a big queue of thousands of items, and on the (hope) that your app doesn’t need to process them in exact priority order. But both of those attributes are likely needed for any approach that tries to parallelize work with a priority queue.

Basically I’m trying to implement Kruskal’s graph searching algorithm, ideally with thousands of vertices on the graph. However, there could be cases of just a few vertices so maybe not entirely appropriate. Maybe I could just limit it to massive graphs :P

Hi, and do you know how to make an update on a local memory priority queue? I’m having some difficulty on implementing the queue update when I update a value of an element.

If you’re giving each thread its own priority queue, and you’re using local memory (likely as a heap) then the solution is just like a CPU single threaded implementation…

you can probably google some example code showing how to push a new value onto a priority queue, pop a value, or update an existing value.

The only small change that might be needed to compensate for is you’ll likely have a maximum static queue size (based on your local array size), but a lot of implementations might use a dynamically growing allocator like a std::vector.