Hi all,
I have a kernel which spends approximately 33% of its time performing a binary search operation to do a table lookup.
This search is performed on an array of Energy values (sorted, but do not have a constant difference between the each successive energy) to find out what array index my energy of interest is nearest to.
However, things that complicate its successful implementation in CUDA:

this “energy mesh” is very large: ~131kb in a sample problem I am toying with, a real problem would be substantially larger.

access will not be coalesced, even amongst threads because the energy of interest for each thread is essentially random.
Does anyone have any experience/lessons learned/suggestions regarding attacking this type of problem? I have a texture search and the regular array search functions pasted below. The texture search is quirky for the time being and seems to take even longer.
I’ll be honest, I dont quite get what Section D.3 of the programming guide is telling me.
[codebox]device unsigned int textureSearch(unsigned int first, unsigned int last, float key, unsigned int loc)
{
unsigned int return_val=0;
while ((first <= last)&&(return_val==0))
{
unsigned int mid = (first + last) / 2;
if (key > tex1Dfetch(big_Emesh_t,mid))
first = mid + 1;
else if (key < tex1Dfetch(big_Emesh_t,mid))
last = mid  1;
else
return_val= mid;
}
if (return_val==0)
return_val= last+1;
return_val=big_Emesh_offsets_d[loc];
return return_val;
}
device unsigned int binarySearch(float* sortedArray, unsigned int first, unsigned int last, float key)
{
while (first <= last)
{
unsigned int mid = (first + last) / 2;
if (key > sortedArray[mid])
first = mid + 1;
else if (key < sortedArray[mid])
last = mid  1;
else
return mid;
}
return last+1;
}
[/codebox]
Thanks all,
Adam