Branch divergence issues

Hi, I’m writing the FloyWarshall algorithm and I have problems with the branch divergence resolution. I tried some methods found online, but the performance was affected. I would like to know if the increase in execution times was normal.

In addition, the control on (Row < num_verticles && Col < num-verticles), causes branch divergence?

template<typename T>
__global__ void floyd_warshall_kernel(T* matrix, int k, int num_vertices) {

        int Row = blockIdx.y * blockDim.y + threadIdx.y;
	int Col = blockIdx.x * blockDim.x + threadIdx.x;

        //Indexs
	int ik = Row * num_vertices + k;
	int kj = k * num_vertices + Col;
	int ij = Row * num_vertices + Col;


        //Algorithm
	if (Row < num_vertices && Col < num_vertices) {  //this cause branch-divergence?
		if (isfinite(matrix[ik]) && 
		    isfinite(matrix[kj]) && 
		    matrix[ik] + matrix[kj] < matrix[ij]) {

	                matrix[ij] = matrix[ik] + matrix[kj];

		   }	
	}