host code compiled using c++ compiler?

jp_england · April 6, 2010, 3:31pm

hello,

I have been looking into using CUDA and OpenCL for a mathematical simulation model. I originally used CUDA, which gave me a speed up of up to 6 times for certain aspects of my code. However, when I converted the CUDA code to OpenCL, the speed up was a lot less (and actually slower in some cases).

I tracked down the differences to being in the host code and found that when I was iterating over a STL vector it was much slower in the C++ host code for the OpenCL project than in the CUDA code for the CUDA project. Therefore, I tried just comparing C++ host code with CUDA host code to see if it makes a difference…and it does.

The simple example is given below:

C++code:

[codebox]

#include

#include <time.h>

void TestCUDA();

void TestCPP()

{

std::vector<int> intList;

unsigned int simulationTime = 10000;

clock_t beginTime = clock();

for (unsigned int i = 0 ; i < simulationTime; i++)

{

    int intToAdd = i;

    intList.push_back( intToAdd );

	

std::vector<int>::iterator iter = intList.begin();

    while (iter != intList.end())

    {

        ++iter;

    }

}

clock_t endTime = clock();

float differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;

printf("C++ calculation done in %.2f milliseconds.\n", differenceMilliSeconds);

}

int

main( int argc, char** argv)

{

TestCPP();

TestCUDA();

}[/codebox]

CUDA code:

[codebox]

#include

#include <time.h>

host void TestCUDA()

{

std::vector<int> intList;

unsigned int simulationTime = 10000;

clock_t beginTime = clock();

for (unsigned int i = 0 ; i < simulationTime; i++)

{

    int intToAdd = i;

    intList.push_back( intToAdd );

	

	std::vector<int>::iterator iter = intList.begin();

    while (iter != intList.end())

    {

        ++iter;

    }

}

clock_t endTime = clock();

float differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;

printf("CUDA calculation done in %.2f milliseconds.\n", differenceMilliSeconds);

}[/codebox]

When I run the application I get the following result:

C++ calculation done in 546.00 milliseconds.

CUDA calculation done in 266.00 milliseconds.

As you can see the code executed in the CUDA __host__function is identical to the C++ function. I thought that any __host_code was compiled using the C++ compiler in the same way that any host C++ code would be. How come I get different speeds then?

Thanks.

jp_england · April 7, 2010, 7:57am

Just thought I’d add that it doesn’t matter how long the simulation is, the function executed in the host function is consistently twice as fast.

jp_england · April 7, 2010, 8:18am

I’ve just removed the timing code to the host code in case tha was working differently on the GPU, but now I get an even more marked difference in speeds. The C++ code is now:

[codebox]

#include

#include <time.h>

void TestCUDA();

void TestCPP()

{

std::vector<int> intList;

unsigned int simulationTime = 10000;

for (unsigned int i = 0 ; i < simulationTime; i++)

{

    int intToAdd = i;

    intList.push_back( intToAdd );

	

	std::vector<int>::iterator iter = intList.begin();

    while (iter != intList.end())

    {

        ++iter;

    }

}

}

int

main( int argc, char** argv)

{

clock_t beginTime = clock();

TestCPP();

clock_t endTime = clock();

float differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;

printf("C++ calculation done in %.2f milliseconds.\n", differenceMilliSeconds);

beginTime = clock();

TestCUDA();

endTime = clock();

differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;

printf("CUDA calculation done in %.2f milliseconds.\n", differenceMilliSeconds);

}[/codebox]

and the CUDA code is:

[codebox]

#include

host void TestCUDA()

{

std::vector<int> intList;

unsigned int simulationTime = 10000;

for (unsigned int i = 0 ; i < simulationTime; i++)

{

    int intToAdd = i;

    intList.push_back( intToAdd );

	

    std::vector<int>::iterator iter = intList.begin();

    while (iter != intList.end())

    {

        ++iter;

    }

}

}[/codebox]

When I run the application I get the following results!

C++ calculation done in 3812.00 milliseconds.

CUDA calculation done in 281.00 milliseconds.

Why is so much faster in the CUDA host code than in the C++ host code?

Topic		Replies	Views
CUDA host code compiling CUDA Programming and Performance	2	4869	April 7, 2010
Cuda CPP vs C CUDA Programming and Performance	1	1428	February 25, 2017
Very slow host code when compiled through nvcc CUDA Programming and Performance	7	1141	December 10, 2022
optimize host code CUDA Programming and Performance	3	839	June 16, 2011
nvcc and visual studio compiler CUDA Programming and Performance	2	3043	March 13, 2010
Significant speedup of OpenCL vs CUDA CUDA Programming and Performance	23	8517	February 12, 2022
C++ classes in host code CUDA Programming and Performance	4	6386	March 26, 2008
Difference in Performance CUDA Programming and Performance	13	9781	August 20, 2008
why cuda is slower than opencl CUDA Programming and Performance	7	2023	April 6, 2016
Cuda OpenCL comparison cuda, openCL, nvidia CUDA Programming and Performance	19	42793	November 1, 2012

__host__ code compiled using c++ compiler?

Related topics

host code compiled using c++ compiler?