compress an integer device array

I am not sure what the best way is to extract the nonzero values of an integer device array A and place it into another device array B.

The pack intrinsic would be an obvious choice:
but it does not seem to work with device arrays.

Hi abalogh,

At the moment we don’t support the pack intrinsic for device arrays. The most straightforward (and probably not the most ideal) way to solve this is to move the device arrays back to the host, do the packing there, and move the data back to the GPU.

I ended up

  1. counting the nonzero entries with a CUF kernel
  2. sorting with a thrust interface
  3. copying with a CUF kernel the now nonzero part

Not very elegant.