Parallel Search Algorithms Parallel search algorithm for determining all non-zero array elements

Skip27388 · November 11, 2011, 12:17pm

I am trying to accomplish a way of determining all the elements of an array which are non zero, their values and locations are the desired output. My initial thoughts were to perform a reduction, as I have seen this done for summing the values of an array. This I have found would work fine a long as only one of the elements at the same reduction location is zero and would fail for locations 1 & 9 in my example below. My next thought was to just simply perform a check on all threads, and record the non-zero elements. Both ways I have thought of achieving it this way have their problems though. If I were to simply store the values in global memory at the present thread location, this would just simply recreate the matrix. Therefore I would need a way to keep a counter of how many non-zero elements there are so that I can store them correctly, which could cause problems between threads.

So my question is this, is there a way of performing a reduction on the stated problem to eliminate all zero elements, or is there a way to keep track of what non-zero element is what?

I am familiar with the CUDA development environment, and some of the libraries, such as the FFT library, but less familiar with the Linear Algebra library and Sparse library. If there is a function in here that could acomplish that I have missed, or a way of solving this that I have not thought of, it would be greatly appreciated.

Inputs Array: [1, 0, 0, 0, 99, 0, 0, 56, 45, 10, 0, 0, 0, 23, 0, 7]
Output Array: [1, 99, 56, 45, 10, 23, 7]
Output Locations: [1, 5, 8, 9, 10, 14, 16]

Thanking you in advance.

Richard

tera · November 11, 2011, 1:21pm

If you could add each element’s index to the data structure (so you could read the indices of the remaining elements after the operation), you could use one of thrust’s stream compaction functions.

Skip27388 · November 14, 2011, 9:45am

Thank you very much, after having had a look at stream compaction, I understand the problem a lot better now and understand that it can be solved using a prefix sum, to gain the correct addresses to store the desired values from the array.

Information for anyone else that is interested can be found here. [url=“http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html”]http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html[/url]

Thanks,

Richard