moderngpu 2.0

I posted moderngpu 2.0 this week:

It’s the best code I’ve ever written. I encourage all users to give it a try.

The new points of power are the load-balancing search transform and segreduce functions, transform_lbs and lbs_segreduce.

It also has an experimental dynamic work-creation mechanism, lbs_workcreate. This is a two-pass function that combines load-balancing search, stream compaction and prefix scan to allow work-items to generate new segments of work. There is work-efficient breadth-first search demo using this mechanism that is economically written and is magically load-balanced by the lbs_workcreate pattern.

The source is only about 5500 lines yet the library has more functional coverage than any other general-purpose CUDA library.


Hi Sean,

The code looks good. I have a couple of questions, though:

  1. Is there a documentation?
  2. We are looking for fast string search capability and were considering using your segmented search. From our initial code reading it looked, however that your segmented search implementation only supports scalar keys and/or doesn’t allow one to have variable length keys as you have with strings. Could you comment on this and recommend an approach that would allow us to have a fast variable length string search solution?

Thank you,