I am trying to accomplish a way of determining all the elements of an array which are non zero, their values and locations are the desired output. My initial thoughts were to perform a reduction, as I have seen this done for summing the values of an array. This I have found would work fine a long as only one of the elements at the same reduction location is zero and would fail for locations 1 & 9 in my example below. My next thought was to just simply perform a check on all threads, and record the non-zero elements. Both ways I have thought of achieving it this way have their problems though. If I were to simply store the values in global memory at the present thread location, this would just simply recreate the matrix. Therefore I would need a way to keep a counter of how many non-zero elements there are so that I can store them correctly, which could cause problems between threads.
So my question is this, is there a way of performing a reduction on the stated problem to eliminate all zero elements, or is there a way to keep track of what non-zero element is what?
I am familiar with the CUDA development environment, and some of the libraries, such as the FFT library, but less familiar with the Linear Algebra library and Sparse library. If there is a function in here that could acomplish that I have missed, or a way of solving this that I have not thought of, it would be greatly appreciated.
Inputs Array: [1, 0, 0, 0, 99, 0, 0, 56, 45, 10, 0, 0, 0, 23, 0, 7]
Output Array: [1, 99, 56, 45, 10, 23, 7]
Output Locations: [1, 5, 8, 9, 10, 14, 16]
Thanking you in advance.
Richard