Implementation of cublasStrsm

I am trying to solve the problem A*X=B,
where A =

 1     2
 2     1

and B =

 8     7
 7     8

Therefore, X should be

 2     3
 3     2

I am implementing the following code:

#define IDX2C(i,j,ld) ((j*ld)+i)			// square matrix

void matrixPrintf(char* format, float* matrix,int dim1,int dim2)
{
	for (int i=0;i<dim1;i++)
	{
		for (int j=0;j<dim2;j++)
			printf(format,matrix[IDX2C(i,j,dim1)]);
		printf("\n");
	}
};

int main(int argc, char **argv)
{ 
	float A[] = { 1, 2, 2, 1 };
	float B[] = { 8, 7, 7, 8 };
	float* d_A, * d_B;

	cudaMalloc((void **) &d_A, 4*sizeof(float));
	cudaMemcpy(d_A, A, 4*sizeof(float), cudaMemcpyHostToDevice);
	cudaMalloc((void **) &d_B, 4*sizeof(float));
	cudaMemcpy(d_B, B, 4*sizeof(float), cudaMemcpyHostToDevice);

	printf("A = \n");
	matrixPrintf("%8.2g ",A,2,2);
	printf("B = \n");
	matrixPrintf("%8.2g ",B,2,2);
	// initialize cublas
	cublasHandle_t handle;
	cublasCreate(&handle);
	const float alpha = 1;

	cublasStatus_t status =
	cublasStrsm(handle, CUBLAS_SIDE_LEFT, CUBLAS_FILL_MODE_UPPER,
		CUBLAS_OP_N, CUBLAS_DIAG_NON_UNIT, 2, 2, &alpha,
		d_A, 2, d_B, 3);
	if (status != CUBLAS_STATUS_SUCCESS)
	{
            printf("cublasTgemm returned error code %d, line(%d)\n", status, __LINE__);
            exit(EXIT_FAILURE);
	}

	std::fill(B,B+4,float(0));
	cudaDeviceSynchronize();
	cudaMemcpy(B, d_B, 4*sizeof(float), cudaMemcpyDeviceToHost);
        cudaFree(d_A);
        cudaFree(d_B);

	printf("B(final) = \n");
	matrixPrintf("%8.2g ",B,2,2);

	return 0;
}

My results are:
A =
1 2
2 1
B =
8 7
7 8
B(final) =
-6 7
7 8

Can any one see what I am doing wrong?

Thanks -

Could someone try my code and see if they get the same results? Is this a bug, or am I doing something wrong?

Thanks!

TRSM is expecting a triangular matrix.
You can’t use it directly with A, you need to do a LU decomposition of A before calling it.
Take a look at MAGMA or CULA.

I don’t agree with your expected result. Maybe a confusion about triangular vs symmetric matrices, or the meaning of “left”? The BLAS reference for STRSM specifies:

SIDE = 'L' or 'l'   op( A )*X = alpha*B

Looking just at the first column vector of B, we have (with upper triangular A, and side=LEFT):

1 2  *  x1  =  8
0 1     x2  =  7

which should result in x2 = 7, x1 = -6, since -6 * 1 + 2 * 7 = 8, and 1 * 7 = 7.

Thanks to both of you. I was assuming that TRSM did the decomposition within - so yes, my input is symmetric, not decomposed.