void evaluate(int *from, int *to, int *start, int *duration, int *earliest)
I updates earliest as per the last value put and not exactly the minimum. I have checked that this algorithm works fine in multithreaded version (using pthread.h) however I am not sure how CUDA works on each thread. Can someone elaborate what I need to do here to make it work properly?
Your code will only work if the kernel is executed with only a single block, and if all values in to are unique.
Otherwise you have a race condition as multiple threads will access the same element in earliest.
To make your kernel work correctly, use AtomicMin().
For optimal performance however you should also consider reorganizing your work to eliminate or minimise use of relatively slow atomic functions, and to use more than just one block.
If you make each thread work on exactly one element of earliest, looping over all values contributing to the minimum, you can eliminate writing out intermediate values to global memory. If the number of loop variations varies a lot between different threads, you can consider looping over chunks of fixed number of values (splitting the work between multiple threads), and use AtomicMin() only to combine the final results of multiple chunks.