release and emulation release comparison

Electro · February 26, 2009, 2:20pm

Hello,

I’ve written a simple program implementing the cublasSgemm() function and timing the calculation process in order to obtain an effective processing power measurement.
In order to compare the processing power of the tesla card and the host computer i made the programs without flag for the first one and with the “emu=1” flag for the second one.

Depending on the size of the entrance matrix i use in the cublasSgemm function i obtain a huge difference between the release and the emulation release. (i use only one matrix as an entry for the function in order to reduce the transfer between the device and the host).

The difference in processing power is so huge (170 Gflops for the tesla card, 16 Mflops for the single core used in the host, for an 1600 square input matrix) that i wonder if the comparison makes sense…

Has anyone got an idea about that ?

Does anyone know about a program that calculates the processing power ? I’m a bit frustrated with 170 Gflops eventhough this value is calculated with the number of operations i would perform to obtain the same result, so any calculation with addresses (nor anything else) is not considered in this value.

Thanks in advance,

Electro

E.D_Riedijk · February 26, 2009, 2:33pm

No, the comparison does not make sense. Host emulation is a very slow way of doing a Sgemm. You should compare with the MKL for example to get a fair comparison.

Electro · February 27, 2009, 7:55am

Thanks but i’m affraid i have no idea what MKL could be…
Is it a program ?

MisterAnderson42 · February 27, 2009, 1:09pm

First hit on google:
[url=“http://www.intel.com/cd/software/products/asmo-na/eng/307757.htm”]http://www.intel.com/cd/software/products/.../eng/307757.htm[/url]

Electro · February 27, 2009, 1:45pm

Thanks for the information !!! External Image

Topic		Replies	Views
Different results in emu vs. release mode CUDA Programming and Performance	2	1181	October 27, 2008
Performance query Odd results profiling GPU speed of matrix multiplication using cublas CUDA Programming and Performance	1	1458	February 12, 2010
cublas large matrix multiplication large matrices won't compute CUDA Programming and Performance	4	3521	January 17, 2008
CUBLAS SGEMM performance CUDA Programming and Performance	5	10699	October 5, 2007
CUBLAS Performance Many algorithms perform abysmally CUDA Programming and Performance	6	7610	February 3, 2008
comparing matmul performance with and without gpu CUDA Programming and Performance	6	1622	November 6, 2016
device speed vs. host speed Why is my device program so slow? CUDA Programming and Performance	8	7900	August 16, 2007
cublas sgemm,dgemm performance issue on telsa 10 and gtx 570 GPU-Accelerated Libraries	1	1298	February 24, 2013
GTX 660 and Nano performance drop-off after sustained matrix multiplies CUDA Programming and Performance	16	815	July 15, 2022
CUBLAS VS CBLAS sgemv Benchmarking matrix-vector operations on GPU and CPU CUDA Programming and Performance	5	10057	March 24, 2014

release and emulation release comparison

Related topics