CUDA host code compiling

jp_england · April 7, 2010, 8:35am

Hi,

I originally posted this on the CUDA for Windows XP board, but thought it may be more interest here…

I have been looking into using CUDA and OpenCL for a mathematical simulation model. I originally used CUDA, which gave me a speed up of up to 6 times for certain aspects of my code. However, when I converted the CUDA code to OpenCL, the speed up was a lot less (and actually slower in some cases).

I tracked down the differences to being in the host code and found that when I was iterating over a STL vector it was much slower in the C++ host code for the OpenCL project than in the CUDA code for the CUDA project. Therefore, I tried just comparing C++ host code with CUDA host code to see if it makes a difference…and it does.

The simple example is given below:

TestVector.cpp

[codebox]

#include

#include <time.h>

void TestCUDA();

void TestCPP()

{

std::vector<int> intList;

unsigned int simulationTime = 10000;

for (unsigned int i = 0 ; i < simulationTime; i++)

{

    int intToAdd = i;

    intList.push_back( intToAdd );

	

	std::vector<int>::iterator iter = intList.begin();

    while (iter != intList.end())

    {

        ++iter;

    }

}

}

int

main( int argc, char** argv)

{

clock_t beginTime = clock();

TestCPP();

clock_t endTime = clock();

float differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;

printf("C++ calculation done in %.2f milliseconds.\n", differenceMilliSeconds);

beginTime = clock();

TestCUDA();

endTime = clock();

differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;

printf("CUDA calculation done in %.2f milliseconds.\n", differenceMilliSeconds);

}[/codebox]

and the CUDA code is:

TestVector.cu

[codebox]

#include

host void TestCUDA()

{

std::vector<int> intList;

unsigned int simulationTime = 10000;

for (unsigned int i = 0 ; i < simulationTime; i++)

{

    int intToAdd = i;

    intList.push_back( intToAdd );

	

    std::vector<int>::iterator iter = intList.begin();

    while (iter != intList.end())

    {

        ++iter;

    }

}

}[/codebox]

When I run the application I get the following results!

C++ calculation done in 3812.00 milliseconds.

CUDA calculation done in 281.00 milliseconds.

Why is so much faster in the CUDA host code than in the C++ host code? I thought the host code was compiled using teh same compiler as the host C++ code. Is this not correct?

Any help would be appreciated!

avidday · April 7, 2010, 8:56am

Probably just optimization settings. nvcc turns on a lot of host compiler optimizations by default, but if your host build system for the opencl version doesn’t do the same, it would be logical that the resulting code may not be as fast.

jp_england · April 7, 2010, 9:12am

That seemed to work thanks. I changed my optimization setting from /O2 to /Ox and now I get approximately the same speeds. Does anybody know what effect this will have on the output (if any?). In the MSDN it says that “In general, /O2 should be preferred over /Ox and /O1 over /Oxs”, but doesn’t say why. Anyone have any ideas?

Topic		Replies	Views
__host__ code compiled using c++ compiler? CUDA Programming and Performance	2	8180	April 7, 2010
Significant speed gap between CUDA and OpenCL - how to debug? CUDA Programming and Performance	3	7691	January 28, 2018
optimize host code CUDA Programming and Performance	3	876	June 16, 2011
Same Implementation in CUDA and OpenCL but different performance, and OpenCL Faster? CUDA Programming and Performance	2	1291	October 11, 2013
Cuda CPP vs C CUDA Programming and Performance	1	1465	February 25, 2017
OpenCL performs better than CUDA CUDA Programming and Performance	1	532	March 1, 2011
Why CUDA slower that OpenCL? CUDA Programming and Performance	5	1607	September 12, 2018
Cuda OpenCL comparison cuda, openCL, nvidia CUDA Programming and Performance	19	43014	November 1, 2012
Difference in Performance CUDA Programming and Performance	13	9875	August 20, 2008
OpenCL runs faster than CUDA and PTX version weirdness.... CUDA Programming and Performance	2	2611	March 4, 2010

CUDA host code compiling

Related topics