Computing the Mahalanobis distance with CUDA


I’m trying to modify the code that computes the Euclidian distance from: to compute instead the Mahalanobis distance.
Is there an example in CUDA ? even with cuBLAS implementation ?

Up to now i found the implementation of the covariance matrix calculation:

__global__ static void
updateA(int rowsA, int colsA, int pitchA,
	const float* d_Xx, const float* d_Xy, const float* d_Xz, 
	const float* d_Yx, const float* d_Yy, const float* d_Yz,
	const float* d_R, const float* d_t,
	float* d_A,
	float sigma_p2){


// compute covariance matrix using cublas

      // cublasSgemm (char transa, char transb, int m, int n, int k, float alpha, 
      //              const float *A, int lda, const float *B, int ldb, float beta, 
      //              float *C, int ldc)
      //   C = alpha * op(A) * op(B) + beta * C,
      // m      number of rows of matrix op(A) and rows of matrix C
      // n      number of columns of matrix op(B) and number of columns of C
      // k      number of columns of matrix op(A) and number of rows of op(B) 
	  cublasSgemm('n', 't',
	              rowsA, rowsA, colsA, // int m, n (rows of A), k (cols of A) ; not op(A)
		      1.0f,         // float alpha
		      d_A, pitchA,  // const float *A, int lda
		      d_A, pitchA,  // const float *A, int lda
		      0.0f,         // float beta
                      d_Covariance, rowsA);


// #define Euclid(a,b,c) ((a)*(a)+(b)*(b)+(c)*(c))
//     float tmp =
//       Euclid(Xx - (R(0)*Yx + R(1)*Yy + R(2)*Yz + t(0)),
//              Xy - (R(3)*Yx + R(4)*Yy + R(5)*Yz + t(1)),
//              Xz - (R(6)*Yx + R(7)*Yy + R(8)*Yz + t(2)) );
//     tmp = expf(-tmp/sigma_p^2)

     float tmpX = Xx - (R(0)*Yx + R(1)*Yy + R(2)*Yz + t(0));
     float tmpY = Xy - (R(3)*Yx + R(4)*Yy + R(5)*Yz + t(1));
     float tmpZ = Xz - (R(6)*Yx + R(7)*Yy + R(8)*Yz + t(2));


     tmpX *= tmpX;
     tmpY *= tmpY;
     tmpZ *= tmpZ;

     tmpX += tmpY;
     tmpX += tmpZ;

     tmpX /= sigma_p2;
     tmpX = expf(-tmpX);

Not sure if the covariance using cublasSgemm is correct.
How do i compute the inverse of the covariance matrix ?

Maybe there is a faster method like in OpenCV that has function call to mahalanobis distance.
Right now i can’t switch to an OpenCV implementation because of the time constraint.

Thank You,

In most cases you can use cuBLAS Strsm():

Also,if you are going to make a cuBLAS call from a device kernel (which I have never done before, usually call from host) , you are going to need a GPU which supports Dynamic Parallelism.

It makes more sense to call from host, does the above work correctly?

Thank you for the info on inverse calculation.

The code works (that’s how it is in the reference code so i did not change it), not sure if the values passed would yield the right result. I will need to cross reference with Matlab.

I’m just surprised nobody implemented the Mahalanobis distance computation using GPU, everybody is just saying how suitable this is for GPU.