Remove duplicated elements

Hi all

I want to remove duplicated elements of an array of unsigned chars. This array is located in global memory and is not sorted.

For instance, the array contains:
array={‘a’,‘f’,‘b’,‘b’,‘a’,‘a’,‘b’,‘e’,‘a’}
the output should be:
array={‘a’,‘f’,‘b’,‘0’,‘0’,‘0’,‘0’,‘e’,‘0’}
or
array={‘0’,‘f’,‘b’,‘0’,‘a’,‘0’,‘0’,‘e’,‘0’}

Zeros marks this element as already in the list. The position of the zeros is not important (that’s because the two possible solutions). On the other hand is important that those elements which are non-zero must to stay in the same position of the array. In other words I can’t sort the list and then remove the duplicated.

I need to do this very very fast and in the CPU is too slow since the size the array is very big. Anybody knows a way in CUDA to do this? And if it is possible, there should not exists too much non-coalesced access.

Thank you very much.

What is the range of size of the array ?

Will the array always contain at least one of every possible value ?

Will you be doing lots of such arrays simultaneously ?
(e.g. different data and even different sizes)

You can do this blazing fast on the CPU… far faster than sending the values to the GPU.

int seen[256]={0};

for (int i=0; i<N; i++) 

  if (!seen[myHugeArrayOfValues[i]]) seen[myHugeArrayOfValues[i]]=1;

  else myHugeArrayOfValues[i]=0;

As long as this is the only thing that you want you application to do, I would agree. If this is a building block in a larger application (where you can keep the data in GPU memory for most of the application), I think that the GPU has the potential to be significantly faster.

A simple way to do would be to make two scans over the array. First record a valid position of each possible letter in a simple data structure (buffer it in shared memory for each CTA and then update an equivalent structure in global memory at the end of the first pass). Then, in the second pass, zero out all entries without an entry in the array. You would need 1 global read and, in the worst case, ~1 global write for each element in the array, so you could potentially hit ~1/2 of the total global memory bandwidth of the GPU. This is also the best case you could do on the CPU, but the available memory bandwidth would be significantly less.

The size of the array is fix. In fact is screen resolution size.

No

No. I want to do this just once in 1 array.

Thank you.

That’s a nice one, but there is just a problem. To simplify the explanation of my problem I said that is and array of uchars, but in fact is an array of array of uchars. The equals test must return true iff all the uchars are equals.

May be I could do some change to that algorithm to tets all uchars.

Thank you.

I’ll try it.

Thank you very much.