I have a sorted list of values from 0 to 1, like
0.1 0.3 0.35 0.6 0.8 1.0
but the number of element can exceed the 1000000.
I must take an input value and find where is placed in the 1000000 elements, the index.
My naive solution is to do an algorithm like Quicksort(without the inserting or move element) with subdivision, but this is non parallel, and i don’t know how to operate with this number of element!
I must use only the global memory?
I can use a preprocessed binary tree?
there is a more quick and parallel algorithm?
If the array is already sorted, then why not just use a binary search?
A standard binary search will not be parallel, but that may not matter – binary search will only take about log2(N)+1 steps, so finding an entry in a list of 1 million will take about 21 steps. There are parallel binary search algorithms, but I would first check if a naive implementation is sufficient. IF you are searching for multiple entries, then I would parallelize in that dimension.
The thrust library already has binary search implemented; see thrust::binary_search.
If it’s a sorted array, the cpu version might fare better than the gpu offloaded version.
Probably with an unsorted version, the gpu version might perform better and this again with a big enough buffer to hide latency.
My problem now is that i must find the element in array and the two neighbors, the previous and the next, for 1000 elements.
I have two solution:
1)send the 1000 elements that have to find the neighbors on the gpu , and if is possible use a gpu algorithm.
2)do all on the cpu
the 1000 elements are in a scenegraph on the cpu, then i must collect all these element and do the algorithm.
But if i must do this for the gpu, what is the best memory object for collect data and send it to the gpu?
You can probably come back to the memory later. The gains used by texture memory against global memory is secondary considering the problem in hand. You need need to figure out if it’s worthwhile doing it on the gpu first.
- Take a buffer, fix a size S, say 1000 elements with random values.
- Try your solution on your cpu, and profile the time taken.
- Next try it on your gpu, and similary profile the time taken.
- Go back to step 1, trying out different array sizes, probably upto a couple of megs.
For the gpu, you can start with the global memory for now and then move on the texture memory.