equivalent to CUDA events cpu time fucntions

I am wondering what would be equivalent to Cuda’s start and end time functions?
gettickcount()? clock()?

Does CUDA events functions measure performance in milliseconds?
Please advice and thanks


I am not entirely sure what you are asking. If you are looking for a high-precision timer to time your host code, you may want to give the following code a try that I have been using for well over a decade.

#if defined(_WIN32)
#if !defined(WIN32_LEAN_AND_MEAN)
#include <windows.h>
static double second (void)
    static double oofreq;
    static int checkedForHighResTimer;
    static BOOL hasHighResTimer;

    if (!checkedForHighResTimer) {
        hasHighResTimer = QueryPerformanceFrequency (&t);
        oofreq = 1.0 / (double)t.QuadPart;
        checkedForHighResTimer = 1;
    if (hasHighResTimer) {
        QueryPerformanceCounter (&t);
        return (double)t.QuadPart * oofreq;
    } else {
        return (double)GetTickCount() / 1000.0;
#elif defined(__linux__) || defined(__APPLE__)
#include <stddef.h>
#include <sys/time.h>
static double second (void)
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return (double)tv.tv_sec + (double)tv.tv_usec / 1000000.0;
#error unsupported platform

Hello and thanks for your time and help.
I want to time only individual functions in my Host code and then compare them with GPU equivalent functions.
I am trying to find a simple function that measures the elapsed time of my function in milliseconds
The above code is too complicated for me!

If you want to time host function execution with a high-resolution timer, the code I posted should work well for you because that is exactly what I wrote it and am using it for. You do not need to worry about the implementation details. The code looks a little obscure because it has conditional code branches for Linux, Windows, and Mac OS X which are the OS platforms supported by CUDA. Simply include the snippet at the start of your code and then call second(), like so:

double start, stop, elapsed;
start = second();
[ .... code under test ...]
stop = second();
elapsed = stop - start; // execution time in second, with microsecond resolution

Make sure you warm up caches etc, when you time host code. You would not want to time the first execution of a piece of code.

I will use your advice.

However I used a different approach and I want you to tell me if it makes sense

double diffclock(clock_t clock1,clock_t clock2)
{ double diffticks=clock1-clock2;
double diffms=(diffticks*1000)/CLOCKS_PER_SEC;
return diffms; }

//and then :
clock_t begin=clock();

//function to measure goes here
clock_t end=clock();
cout << “Time elapsed: " << double(diffclock(end,begin)) << " ms”<< endl;
return 0; }

//does it make sense?

As far as I recall, clock() provides a low-resolution clock, where the resolution is 1/60 or 1/100 of a second. This resolution is too low for accurately timing individual functions on modern CPUs, unless they happen to be very long-running functions.

Norbert, your Linux/OSX timing harness can be slightly improved in accuracy. The gettimeofday() function is returning the number of seconds and microseconds since the 1970 start epoch. This means that a measurement now, 44 years later, has a microsecond counter value of about 44 years * 365 days * 24 hours * 3600 seconds *1000000 microseconds, which is about 2^50.4. This is close enough to the 53 effective bits of mantissa in a double that you get precision limited representation errors when differencing the start and end times with very short (on the order of a microsecond) intervals.

A quick fix is to offset the epoch to effectively make it zero centered in say year 2015. This leaves enough mantissa bits for another decade or so.
Just update the line 32 in your code to be:

const time_t Epoch2015 = 45UL*365*24*3600; 
  return (double)(tv.tv_sec-Epoch2015) + (double)tv.tv_usec / 1000000.0;

Looking a bit deeper, the real problem is that while the gettimeofday() call is returning an integer number of microseconds, that count is being divided by 1000000.0, which makes it not exactly representable in floating point. This is usually ignorable except we only have a few bits of mantissa left because of that 50+ bit epoch offset. So an alternative and perhaps superior fix would be to store the number of microseconds, not seconds, and difference those and do the 1000000.0 division on the difference.

Yet another layer of improvement could come by using the finer grained clock in Linux:

double newsecond()
  timespec ts;
  clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
  return (double)ts.tv_sec + (double)ts.tv_nsec/1000000000.0;

This timer uses an epoch based on the machine boot time, and returns nanoseconds, not microseconds. In practice in x86-64 Linux the timer quantum seems to be about 0.05 us, giving a lot finer results as well. clock_gettime() is a bit more annoying in that it needs the librt library to be linked and is not available on all flavors of UNIX. gettimeofday() is POSIX so it’s extremely portable.

Those are all excellent points one may want to consider if more than microsecond resolution is needed. Not sure whether this is practical. I find that one usually comes up against various sources of noise in modern systems, such as the memory hierarchy. Sub-microsecond timing would also require calibration to subtract out the overhead of the OS functions used to report the time.

When I started using the posted code a long time ago, I convinced myself that microsecond resolution could be maintained within “double” data until Unix time stamps roll over in 2038, requiring no more than 51 bits and leaving leaving two bits to represent fractional (quarter) microseconds. Thinking about it now, I am not sure whether there actually is a connection between the “Year 2038” problem and the gettimeofday() function; it has been too long a time since I looked at this functionality.