When I call cublasGetMatrix (nRows, nColumns, sizeof(float),
gpuC, nRows, cpuC, nRows); where gpuC is an array on the device/cpu on host. gpuC=> returns a number that increases by exactly 4 every time I run the program. If I exit and rebuild/run again, the number is the same as before, just +4. nRows and nColumns are both 2. gpuC should be the product of
Is there something wrong with my Sgemm call? If I just straight copy the matrices gpuA/B to cpuC (either one, not both) then I can pull out the correct values from cpuC.