Hi all! I am implementing an idea given to me that basically has one matrix(with a lot of rows and 3 columns) and outputs 3 arrays(one to use with CUDPP compact(done) one to use with CUDPP sort and one with the row indexes, to be used with the CUDPP library too).

I am now trying to find the array with the sort key, but I am having a hard time thinking parallel. I found a way to do it with CPU:

```
float*cpusort(float*vecin,float*vecout,int dim){
for(int i=0;i<dim;i++){
float min=INT_MAX;
for(int j=0;j<dim;j++){
if (vecin[j]<min){
min=vecin[j];
}
}
for(int m=0;m<dim;m++)
if(vecin[m]==min){
vecin[m]=INT_MAX;
break;
}
vecout[i]=min;
}
return vecout;
}
```

Basically, for each cycle, it finds the minimum value of the array, saves that minimum in a variable and then it replaces that minimum with INT_MAX, it then goes on to search for the next minimum and so on. It works, but I am having a hard time transposing this line of though to GPU, basically I want to obtain a sort key based on the matrix’s first row. Does anyone has any thoughts on this? I do not want code, I just need some kind of advice Thanks in advance