My question is about what this program is doing… (http://docs.nvidia.com/cuda/cuda-samples/#matrix-multiplication–cublas-)

So, when I run it, I see this:

```
[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "GRID K520" with compute capability 3.0
MatrixA(320,640), MatrixB(320,640), MatrixC(320,640)
```

And, it says that it passed. But, how can this possibly be? You can’t multiply A * B at all, let alone a matrix of that shape… Shouldn’t matrix A’s columns be the same as matrix B’s rows? And, yet, no matter how I change the values of matrix_size.uiHA, matrix_size.uiWA etc. I get wrong results. (Unless I preserve the same sort of structure as above: All three matrices having the same dimensions. It seems that matrix_size.uiHB is useless, aside from allocating memory.

From the comments, it seems like what’s actually happening is Transpose[B] * Transpose[A] is being calculated, and that seems to be what matrixMulCPU is doing: It’s multiplying the matrices (wB x wA) * (wA x hA)

So, changing to this:

```
matrix_size.uiWB = 2 * block_size * iSizeMultiple;
matrix_size.uiHB = 2 * block_size * iSizeMultiple;
matrix_size.uiWA = 2 * block_size * iSizeMultiple;
matrix_size.uiHA = 4 * block_size * iSizeMultiple;
matrix_size.uiWC = 2 * block_size * iSizeMultiple;
matrix_size.uiHC = 4 * block_size * iSizeMultiple;
```

says everything is good. If tho, I change matrix_size.uiHB and matrix_size.uiWA to 3 tho, then I get a bunch of errors. I’m at a loss to explain what’s going on in this demo… Can someone explain it to me?

Thanks!