I have a vector of A that stores per node connected node count and vector of B stores per node connected node indices like:
A = [1,2,1] // counts
B = [0,0,1,2] // indices
I would like to compute dual of this graph with Cuda, such that
C = [2,1, 1] // counts
D = [0,1,1,2] // indices
Currently my solution is:
- Run kernel and compute how many times each node referenced, store it in vector C
- Compute total count L (reduce C, potentially Thrust or Bulk)
- Allocate vector of D in size of L
- Compute exclusive scan of C (Thrust or Bulk), store it in vector of O (used as offset when dual indices gets stored)
- Run another kernel that populate indices and stores them in D accounting offsets in O
Is there a better way to do this? Is it possible to do this using only Thrust or Bulk? As they probably will perform better than my kernels?
Thanks in advance.