tool to calculate the speedup of an application that runs on gpu based heterogeneous computing plat

may i know whether there is any tool available to compare the total execution time of an application that first, solely runs on high performance cpu, and then runs on a nvidia tesla gpu based heterogeneous computer and there by calculating ‘nX’ magnitude of acceleration obtained by the application by using gpu.

It is not clear to me what you are asking. Are you looking for a methodology for predicting the speedup one could achieve by switching a particular application from a CPU-only to a GPU-accelerated version, or do you have CPU-only and GPU-accelerated versions of an application in hand and simply want to measure the speedup?

If it is the former, I am afraid there is no tool (at least I am not aware of one), however by analyzing some key performance characteristics of the application you may be able to derive a reasonable estimate using, for example, the roofline model.

If it is the latter, simply measure the time to completion for whatever task is relevant that you are interested in. If the execution time is small, you may want to employ a high-resolution timing facility provided by your operating system of choice, e.g. gettimeofday() on Linux. The ratio of the respective execution times is the speed-up.

Thank you for sparing time for me. I meant the first one, but with a slight modification. That is i am looking for a tool rather than methodology for comparing the speedup achieved by a a particular application from a CPU-only to a GPU-accelerated version.
If there is no such tool then I would like to know how researchers are able to claim that, they had achieved 5X speedup or 10X speed up of a particular application. can you please shed some light on this

In the simplest case: by stopwatch.

CPU takes 15 seconds

GPU takes 1.5 seconds

Yay.

It’s quite possible, and indeed common, that a program (e.g. CUDA) written to run on a GPU may not be runnable on a system without a GPU. Therefore there is no general purpose tool that can in all cases take any given executable and run it on either a GPU based system or a CPU based system (and yield useful information).

In order to create a program that can be run either with or without a GPU requires multiple code paths and a programming methodology to support this. In addition it requires a “realization” of an algorithm for the CPU and another for the GPU.

People who compare CPU vs GPU execution performance have two implementations or realizations. That may be either in a single program, due to multiple code paths and programming support, or in 2 separate programs, one designed and compiled to run on the GPU and another designed and compiled to run on the CPU.

There are no general statements that govern this practice. A comparison between a CPU-based implementation and a GPU based implementation necessarily depends on specifics. Furthermore, the opinion of what constitutes a “valid” comparison will vary from case to case, and person to person.

To take a specific example, a researcher may create an application that runs on CPU (only) that sorts 32M integers. They may then create another application (or possibly the same application, with 2 different code paths), that sorts 32M integers on the GPU. They may compare execution times, and say, “the same operation (sorting 32M integers) took 10 seconds on the CPU and 2 seconds on the GPU, so I got a 5x speedup”.

The judgement about whether that statement is valid/useful/interesting will depend on many many factors, as you can probably imagine.

(Execution time is not the only way to compare performance. I’m just using it as an example here.)

The vast majority of the many papers I have seen that report speedups for GPU-accelerated applications show measured times or suitable performance metrics (such as ns/day for molecular simulations) for particular tasks performed by that application, for both CPU-only and GPU-accelerated versions. In other words, they compare measured performance numbers, not projections.

The reported speed-up is then simply the ratio of the respective execution times or performance metric values. Various commonly-used applications already include performance reporting mechanisms and do not require additional instrumentation, in the case of existing benchmark codes that is even the main purpose of the application.

Personally, I would consider the performance methodology of some papers as weak, in that they report GPU-accelerated performance versus single-core CPU-only performance, although CPU-only performance could be improved significantly by using a multi-threaded, multi-core implementation. This leads to controversial reported speed-ups of 100x, 200x, etc. The best papers compare highly optimized, multi-threaded CPU implementations with an optimized GPU-accelerated version. Such papers tend to report speedups in the range you mention, e.g. 5x to 10x.

[Later:] I was still typing while txbob posted. He makes a good point that some applications are designed from the start for GPU-accelerated heterogeneous computing, and thus there is no equivalent CPU-only version one could run for performance comparison purposes. There are least two molecular simulation packages of that nature I am aware off. In those cases, performance comparisons will likely be based on different GPUs, different number of GPUs, or different application versions. I would expect the number of such applications designed from start for heterogeneous platforms to increase over time.

but then… if you report multi-threaded/multi-core performance, you might as well consider reporting multi-GPU performance in your papers (for problems that scale across GPUs).

Absolutely. Multi-GPU performance is certainly of interest to many, and it would be great to see those numbers published, in particular as I would expect multi-GPU configurations to increase in popularity, as all possible parallelization opportunities are exploited. Not all applications are multi-GPU capable yet, however, and in some cases researchers may simply not have the funds to set up multi-GPU systems.

While not a paper, the AMBER benchmark page is a nice example for a performance comparison that shows multi-threaded CPU-only, single-GPU accelerated, and multi-GPU accelerated results:

http://ambermd.org/gpus/benchmarks.htm

As txbob points out, pure performance comparisons are not the only interesting comparisons, one might also want to compare derived metrics such as performance/watt or performance/dollar.

I am personally fond of research papers that show increased quality of solution (e.g. accuracy) along with lower run times. Faster and better science, so to speak.

thanks to all for your responses. let me be more precise. As a part of my research i need to show a modified search algorithm performs more faster when implemented in a gpu computing platform than on a FPGA after some optimizations [Actually it does].But how can i show the same in my paper? Also i don’t know how to generate a compiler report or a tool report to estimate the total execution time of the heterogeneous computing platform.