So, I have had something working, which I want to make sure that it would be the right way to do it.
The first case, I am trying to use OpenMP for Host and CUDA for Device. This is my file system:
src
|
|--Makefile
|--main.cpp
|--host.h
|--host.cpp
|--device.h
|--device.cu
Inside Makefile:
main_objects = main.o
host_objects = host.o
device_objects = device.o
all: $(main_objects) $(host_objects) $(device_objects)
g++ -O3 -fopenmp $(main_objects) $(host_objects) $(device_objects) -L/usr/local/cuda/lib64 -lcudart -o main
$(main_objects): $(main_objects:.o=.cpp)
g++ -O3 -I/usr/local/cuda/include -c $< -o $@
$(host_objects): $(host_objects:.o=.cpp)
g++ -O3 -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_OMP -fopenmp -c $< -o $@
$(device_objects): $(device_objects:.o=.cu)
nvcc -O3 -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -c $< -o $@
clean:
rm -f *.o main
And then, main.cpp
#include <thrust/host_vector.h>
#include <thrust/sort.h>
#include <cstdlib>
#include <iostream>
#include <iterator>
// defines the function prototype
#include "device.h"
#include "host.h"
int main(void)
{
// generate 20 random numbers on the host
thrust::host_vector<int> h_vec(20);
generate_on_host(h_vec);
// interface to CUDA code
sort_on_device(h_vec);
// print sorted array
thrust::copy(h_vec.begin(), h_vec.end(), std::ostream_iterator<int>(std::cout, "\n"));
return 0;
}
Inside host.h:
#include <thrust/host_vector.h>
// function prototype
void generate_on_host(thrust::host_vector<int>& h);
And host.cpp:
#include "host.h"
#include <thrust/random.h>
#include <thrust/generate.h>
// function prototype
void generate_on_host(thrust::host_vector<int>& h)
{
thrust::default_random_engine rng;
thrust::generate(h.begin(), h.end(), rng);
}
Then, inside device.h:
#pragma once
#include <thrust/host_vector.h>
// function prototype
void sort_on_device(thrust::host_vector<int>& V);
Finally, device.cu:
#include <thrust/sort.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include "device.h"
void sort_on_device(thrust::host_vector<int>& h_vec)
{
// transfer data to the device
thrust::device_vector<int> d_vec = h_vec;
// sort data on the device
thrust::sort(d_vec.begin(), d_vec.end());
// transfer data back to host
thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin());
}
Everything compiles and execute nicely, so I just want a word of reassurance that I am not practicing any form of dark magic here…
And if I am to use two separate backend, thrust::omp::vector and thrust::cuda::vector, in say, two different files device_omp.cpp and device_cuda.cu, is the way they are linked inside Makefile correct? Except you have to use both -DTHRUST_DEVICE_BACKEND=THRUST_DEVICE_SYSTEM_OMP/CUDA for either of them.