Hi! I am writing the following kernel to implement the Bellman-Ford routing algorithm. The weight of the edges needs to be in double precision. The edges are represented using d_from for the starting node, d_to for the destination node and d_edge_weigth fot the weigth of the edge. The vector d_node_weight is initialize to DBL_MAX for all node except the starting node which is set to 0. The kernel updates the nodes’s weight using an iterative process. Once no more changes are required, d_node_weight contains the sum of the weights to go from the starting node to node i.
My question is: when writing to d_node_weight[to_node]
d_node_weight[to_node] = from_weigth_node + weigth;
is the storing operation done atomically i.e. is it guaranteed that all 64 bits forming the double will come from the same thread. I know that many threads could be writing to the same d_node_weight[to_node], but I want to make sure that I will never get, lets say the first 32 bits coming from one thread when the other 32 bits come from another thread.
Thank you for you help.
__global__ void bellman_kernel(
int *d_from, ///< [in] Pointer to the from nodes
int *d_to, ///< [in] Pointer to the to nodes
double *d_edge_weigth, ///< [in] Pointer to the weigth vector
int nedges, ///< [in] Number of elements in the d_from, d_to and d_weigth vectore
double*d_node_weight, ///< [in/out] Pointer to the weigth of the nodes
bool *d_modified_flag) ///< [out] Pointer to the flag, set to true if a change was made, set to false if no change was made
{
const unsigned int tid = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int grid_size = gridDim.x * blockDim.x;
for (int i = tid; i < nedges; i = i + grid_size)
{
int from_node = d_from[i];
int to_node = d_to[i];
real weigth = d_edge_weigth[i];
real from_weigth_node = d_node_weight[from_node];
if (from_weigth_node + weigth < d_node_weight[to_node]) {
d_node_weight[to_node] = from_weigth_node + weigth;
*d_modified_flag = true;
}
}
}