Hi all,
I am having problems figuring out how a cuda kernel receives an std::vector?
My host code is something like:
#include <vector>
using namespace std;
const int N = 10;
vector<int> tmp;
for(int i = 0; i < N; i++) tmp.push_back(i);
vector<int>* tmp2;
cudaMalloc((void**)&tmp2, tmp.size() * sizeof(int));
cudaMemcpy(tmp2, &tmp, tmp.size() * sizeof(int), cudaMemcpyHostToDevice);
(First: is the above part right in and of itself?)
call the kernel
square_vector <<< nblocks, block_size >>> (tmp2, N);
the kernel
__global__ void square_vector(int *a, int N)
What should the specification of the kernel incoming parameters be? How do you pass to it a vector?
And then how would I index the vector components?
–
I know probably some people will reply and say not to use c++ vectors, but I know this has been done successfully, and I’d like to know what I’m doing wrong above.
Thank you!