Hello,
I need to compute What is the global memory-access to floating point computation ratio in each thread for the following piece of code:
if( Row < NUM_OBJ && Column < NUM_OBJ){
// do the compute Euclidean Distance for each row and col
for(long k = 0; k < FEAT_LEN; k++){
result_d += pow(A_d[Row * FEAT_LEN + k] - B_d[k * NUM_OBJ + Column],2);
}
// store the result
C_d[Row * NUM_OBJ + Column] = sqrt(result_d);