I am trying a simple double precision matrix matrix multiplication using CPU+GPU concept , for CPU I am using cblas_dgemm(…) linked with Intel MKL 10.0.3 and on GPU using CUBLAS with cuda driver 3.2 . I am using the Intel Xeon E5420 (Dual socket Quad Core) @ 2.5 Ghz clock speed having the Peak 80 Gflops , with the Tesla C1060 , PCIe x16 gen2 slot . Here are the some results which I am getting

- GPU only --> CUBLAS --> Each Matrix size 12288 * 12288 —> 71.1 GFLOPs sustain ( for double precision)
- CPU+GPU dgemm —> CUBLAS + CBLAS —> Each Matrix size 12288 * 12288 —> 142.8 GFLOPS sustain( for double precision , by diving the Matrix B equally between the CPU & GPU)

I am considering total doble precision peak for CPU+GPU is = 80 + 78 = 158 GFLOPS

I am getting the sustain as = 142.8 GFlops

My query is

—> Are the results are acceptable?

—> How to decide whether the obtained sustain performance is correct or not ?