Testing my Tridiagonalization Algorithm

So I used CUBLAS functions to help speed up an implementation of a Lanczos algorithm which transforms a real symmetric matrix into a matrix with a diagonal and sub/super diagonal(the tridiagonal matrix is symmetric itself in the end, so sub/super are equal at related positions). So the problem I am having is how can I test that this algorithm is working correctly? I know this isn’t directly CUDA related, but it seemed like a decent place to ask. Also, if I have problems with the accuracy, I’ll have to post my code for help debugging most likely.

If you have suggestions, anything is helpful!


I’m assuming you’ve written this code from scratch as opposed to modifying an already existing one. Because otherwise comparing against the results of the original code seems like it would work fairly well…

Testing the correctness should be fairly easy. Use a small matrix and compare the eigenvalues you calculate to those that say MATLAB computes. Or just generate a matrix with known eigenvalues. Do this for many small matrices. If they agree, then your algorithm is probably correct.

The problem with something like Lanczos isn’t really correctness but stability. After you’re sure it produces correct results for small matrices, try increasing the size and see what happens. What have you done to ensure that you don’t lose orthogonality of the vectors?

So the eigenvalues of the transformed tridiagonal matrix are going to be exactly equal to those of the original matrix? This was where my confusion lies I think. If this is the case, I think that I did something wrong computation wise. I can link the code if you would like to look at it. And yes, I didn’t modify code but developed it from scratch.

Edit: Currently I am doing nothing to ensure stability. Since you seem to have an idea oh Lanczos, what would you suggest that is relatively easy to implement? It seems like a lot of the information I find on lanczos is a summary of what it is, which isn’t very helpful. However, for a small matrix the results should still be approximately correct, right? In which case I should still be able to test for the correctness of say a 4x4 input matrix.

Ok, I am definitely making a mistake somewhere. I assume its a math mistake and not a misunderstanding of the algorithm. You can view the algorithm at http://en.wikipedia.org/wiki/Lanczos_algorithm

my code is as follows, where gpuVecV0 is the same as the vector V(j-1) and gpuVecV1 is Vj.

cudaMemcpy(gpuA, cpuA, nColumns*nColumns*sizeof(float), cudaMemcpyHostToDevice);

cudaMemcpy(gpuVecV1, cpuVecV1, nColumns*sizeof(float), cudaMemcpyHostToDevice);

for(int i = 0; i < nColumns; i++){  //might be an error in loop length (off  by 1)

	initVector<<<1,4>>>(gpuTemp, 0, nColumns);

	initVector<<<1,4>>>(gpuVecW, 0, nColumns); //sets everying to 0

	cublasSaxpy(nColumns, cpuVecBeta[i], gpuVecV0, 1,  gpuVecW, 1);

	cublasSgemv('t', nColumns, nColumns, 1.0f, gpuA, nColumns, gpuVecV1, 1, -1.0f, gpuVecW, 1);

	cpuVecAlpha[i] = cublasSdot(nColumns, gpuVecW, 1, gpuVecV1, 1);

	cublasSaxpy(nColumns, -1.0f*cpuVecAlpha[i], gpuVecV1, 1, gpuVecW, 1);

	cpuVecBeta[i+1] = cublasSnrm2(nColumns, gpuVecW, 1);

	tempDivisor = 1/cpuVecBeta[i+1];

	cublasSaxpy(nColumns, tempDivisor, gpuVecW, 1, gpuTemp, 1);

	cudaMemcpy(gpuVecV0, gpuVecV1, nColumns*sizeof(float), cudaMemcpyDeviceToDevice);

	cudaMemcpy(gpuVecV1, gpuTemp, nColumns*sizeof(float), cudaMemcpyDeviceToDevice);


	vectorWi_0GPU , 1, 0, vectorVi_1GPU, 1);


I must have made a mistake with one of the cublas functions, but I can’t figure out where it is.

I found 1 mistake, but its still only giving me 1 correct eigenvalue, which is odd to say the least.