Computing dual of the graph


I have a vector of A that stores per node connected node count and vector of B stores per node connected node indices like:

A = [1,2,1] // counts
B = [0,0,1,2] // indices

I would like to compute dual of this graph with Cuda, such that

C = [2,1, 1] // counts
D = [0,1,1,2] // indices

Currently my solution is:

  • Run kernel and compute how many times each node referenced, store it in vector C
  • Compute total count L (reduce C, potentially Thrust or Bulk)
  • Allocate vector of D in size of L
  • Compute exclusive scan of C (Thrust or Bulk), store it in vector of O (used as offset when dual indices gets stored)
  • Run another kernel that populate indices and stores them in D accounting offsets in O

Is there a better way to do this? Is it possible to do this using only Thrust or Bulk? As they probably will perform better than my kernels?

Thanks in advance.