Hi,
I originally posted this on the CUDA for Windows XP board, but thought it may be more interest here…
I have been looking into using CUDA and OpenCL for a mathematical simulation model. I originally used CUDA, which gave me a speed up of up to 6 times for certain aspects of my code. However, when I converted the CUDA code to OpenCL, the speed up was a lot less (and actually slower in some cases).
I tracked down the differences to being in the host code and found that when I was iterating over a STL vector it was much slower in the C++ host code for the OpenCL project than in the CUDA code for the CUDA project. Therefore, I tried just comparing C++ host code with CUDA host code to see if it makes a difference…and it does.
The simple example is given below:
TestVector.cpp
[codebox]
#include <time.h>
void TestCUDA();
void TestCPP()
{
std::vector<int> intList;
unsigned int simulationTime = 10000;
for (unsigned int i = 0 ; i < simulationTime; i++)
{
int intToAdd = i;
intList.push_back( intToAdd );
std::vector<int>::iterator iter = intList.begin();
while (iter != intList.end())
{
++iter;
}
}
}
int
main( int argc, char** argv)
{
clock_t beginTime = clock();
TestCPP();
clock_t endTime = clock();
float differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;
printf("C++ calculation done in %.2f milliseconds.\n", differenceMilliSeconds);
beginTime = clock();
TestCUDA();
endTime = clock();
differenceMilliSeconds = float(endTime - beginTime) / CLOCKS_PER_SEC * 1000.0f;
printf("CUDA calculation done in %.2f milliseconds.\n", differenceMilliSeconds);
}[/codebox]
and the CUDA code is:
TestVector.cu
[codebox]
host void TestCUDA()
{
std::vector<int> intList;
unsigned int simulationTime = 10000;
for (unsigned int i = 0 ; i < simulationTime; i++)
{
int intToAdd = i;
intList.push_back( intToAdd );
std::vector<int>::iterator iter = intList.begin();
while (iter != intList.end())
{
++iter;
}
}
}[/codebox]
When I run the application I get the following results!
C++ calculation done in 3812.00 milliseconds.
CUDA calculation done in 281.00 milliseconds.
Why is so much faster in the CUDA host code than in the C++ host code? I thought the host code was compiled using teh same compiler as the host C++ code. Is this not correct?
Any help would be appreciated!