Sending std::vector to kernel

Hi all,

I am having problems figuring out how a cuda kernel receives an std::vector?

My host code is something like:

#include <vector>

using namespace std;

const int N = 10;  

vector<int> tmp;

for(int i = 0; i < N; i++) tmp.push_back(i);

vector<int>* tmp2;

cudaMalloc((void**)&tmp2, tmp.size() * sizeof(int));

cudaMemcpy(tmp2, &tmp, tmp.size() * sizeof(int), cudaMemcpyHostToDevice);

(First: is the above part right in and of itself?)

call the kernel

square_vector <<< nblocks, block_size >>> (tmp2, N);

the kernel

__global__ void square_vector(int *a, int N)

What should the specification of the kernel incoming parameters be? How do you pass to it a vector?

And then how would I index the vector components?

I know probably some people will reply and say not to use c++ vectors, but I know this has been done successfully, and I’d like to know what I’m doing wrong above.

Thank you!

Oh man, this took way longer than it should have to get working, but here is the solution for some other poor souls out there:

vector<int> tmp;

for(int i = 0; i < N; i++) tmp.push_back(i);

size_t size = N * sizeof(int);

cudaArray* cuArray = 0;

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<int>();

cudaError_t error = cudaMallocArray( &cuArray, &channelDesc, size,1);

cudaError_t error1= cudaMemcpyToArray(cuArray, 0, 0, (void*)&tmp[0], size, cudaMemcpyHostToDevice);

Then send it as a cudaArray and the kernel receives it as such.