Hi,

I’m trying to modify the code that computes the Euclidian distance from: http://home.hiroshima-u.ac.jp/tamaki/study/cuda_softassign_emicp/ to compute instead the Mahalanobis distance.

Is there an example in CUDA ? even with cuBLAS implementation ?

Up to now i found the implementation of the covariance matrix calculation:

```
__global__ static void
updateA(int rowsA, int colsA, int pitchA,
const float* d_Xx, const float* d_Xy, const float* d_Xz,
const float* d_Yx, const float* d_Yy, const float* d_Yz,
const float* d_R, const float* d_t,
float* d_A,
float sigma_p2){
...
// compute covariance matrix using cublas
// cublasSgemm (char transa, char transb, int m, int n, int k, float alpha,
// const float *A, int lda, const float *B, int ldb, float beta,
// float *C, int ldc)
// C = alpha * op(A) * op(B) + beta * C,
//
// m number of rows of matrix op(A) and rows of matrix C
// n number of columns of matrix op(B) and number of columns of C
// k number of columns of matrix op(A) and number of rows of op(B)
cublasSgemm('n', 't',
rowsA, rowsA, colsA, // int m, n (rows of A), k (cols of A) ; not op(A)
1.0f, // float alpha
d_A, pitchA, // const float *A, int lda
d_A, pitchA, // const float *A, int lda
0.0f, // float beta
d_Covariance, rowsA);
...
// #define Euclid(a,b,c) ((a)*(a)+(b)*(b)+(c)*(c))
// float tmp =
// Euclid(Xx - (R(0)*Yx + R(1)*Yy + R(2)*Yz + t(0)),
// Xy - (R(3)*Yx + R(4)*Yy + R(5)*Yz + t(1)),
// Xz - (R(6)*Yx + R(7)*Yy + R(8)*Yz + t(2)) );
// tmp = expf(-tmp/sigma_p^2)
float tmpX = Xx - (R(0)*Yx + R(1)*Yy + R(2)*Yz + t(0));
float tmpY = Xy - (R(3)*Yx + R(4)*Yy + R(5)*Yz + t(1));
float tmpZ = Xz - (R(6)*Yx + R(7)*Yy + R(8)*Yz + t(2));
__syncthreads();
tmpX *= tmpX;
tmpY *= tmpY;
tmpZ *= tmpZ;
tmpX += tmpY;
tmpX += tmpZ;
tmpX /= sigma_p2;
tmpX = expf(-tmpX);
```

Not sure if the covariance using cublasSgemm is correct.

How do i compute the inverse of the covariance matrix ?

Maybe there is a faster method like in OpenCV that has function call to mahalanobis distance.

Right now i can’t switch to an OpenCV implementation because of the time constraint.

Thank You,

Domi