Hi everyone,
I’m starting with CUDA and I created a simple example of a program that computes the max value of a vector. After implemented the CUDA version using Thrust, I noticed that the total time of the CPU version is faster than GPU using diferent sizes of data,and as so, I felt quite dissapointed with CUDA.
The CUDA Implementation is:
#include <cuda.h>
#include <vector>
#include <cuda_runtime.h>
#include <thrust/reduce.h>
#include <thrust/transform.h>
#include <thrust/functional.h>
#include <thrust/device_vector.h>
#include <thrust/extrema.h>
#include <iostream>
#include <stdlib.h>
int main(int argc,char **argv){
int n_data=atoi(argv[1]);
std::vector<float> p(n_data);
for(int i=0;i<n_data;i++){ //fill vector with dummy values
p[i]=i;
}
thrust::device_vector<float> p_device_max(n_data);
thrust::copy(p.begin(),p.end(),p_device_max.begin());
std::cout<<*(thrust::max_element(p_device_max.begin(),p_device_max.end()))<<std::endl;
return 0;
}
And using Intel Profiler to see the total execution time, I noticed that CPU version takes around 2ms to compute with n_data=5000 and GPU takes around 71 ms.
This behaviour is normal by using cuda with single operations or low aritmetic intensity operations considerating that thrust provides several optimizations?
Thanks in advance.