Hi, I’m writing the FloyWarshall algorithm and I have problems with the branch divergence resolution. I tried some methods found online, but the performance was affected. I would like to know if the increase in execution times was normal.
In addition, the control on (Row < num_verticles && Col < num-verticles), causes branch divergence?
template<typename T>
__global__ void floyd_warshall_kernel(T* matrix, int k, int num_vertices) {
int Row = blockIdx.y * blockDim.y + threadIdx.y;
int Col = blockIdx.x * blockDim.x + threadIdx.x;
//Indexs
int ik = Row * num_vertices + k;
int kj = k * num_vertices + Col;
int ij = Row * num_vertices + Col;
//Algorithm
if (Row < num_vertices && Col < num_vertices) { //this cause branch-divergence?
if (isfinite(matrix[ik]) &&
isfinite(matrix[kj]) &&
matrix[ik] + matrix[kj] < matrix[ij]) {
matrix[ij] = matrix[ik] + matrix[kj];
}
}