wrong results with divergent code

I have a strange problem: I am trying to find out if a given number lies between two floating point numbers. I have a float array stored in the global device memory. For the simplest case, I take three points in this array with values 0.0, 0.5 and 1.0. I need to find the value lower than the input value, so for an input 0.25 the result would be 0.0, and for 0.6 it would be 0.5.

This is my device function (called from my kernel) which returns the wrong value:

// positions: input array

//length: size of the array

//position: query point (always in [0,1])

__device__ int getValue(float *positions, int length, float position)


	int lower =0;

	//find the lower limit

	for(int i=0;i<length;i++)




				lower = i;



return lower;


The problem is that it is always returning me the same value. If the code is run on the CPU, I get the proper result, which varies from input to input. On the GPU however, I am always getting the result as 2. I have verified that the position argument being supplied is correct. Does anyone know the reasoning behind this?

if “length” is the actual length of positions, then “positions[i+1]” is off by one when i=length-1…


First, if you always compares to 0.0, 0.5 and 1.0, perhaps you prefers constant memory

Second, your kernel is executing without errors?

printf("%s", cudaGetErrorString(cudaGetLastError()));

Third, positions[i+1] is unallocated ¿is not it?

Definitively, as your array contains values in ascending orders, you should not do it like that, you could at least write:
if( positions[i+1]>position ) {
lower = i;

And you should naturally consider a dichotomy search!