Question about working with vectors


I wanted to pass a vector to my kernel, but then I figured out that CUDA does not support std::vector (right?).

So the only way to work with vectors seams to me by using thrust::host_vector and thrust::device_vector.

The problem now is.

  1. I don’t understand 100% how to use them.

  2. My vector consist of a struct and in the struct there is another vector. What vector should I take in the struct? host/device?

It looks like this:

vector<Sol> x;

struct Sol {

vector<float> v;


So one more time what exactly I want to do:

-Generate a vector of a defined struct (in the struct there are vectors)

-Passing the vector to the GPU (Kernel) where I want to calculate something that is saved in the struct

-Copy the vector back to the host.

Any idea how to do this?

Nobody an idea or did I explain it to difficult?

No ideas, I suspect. Generally speaking, double nested variable length arrays are a bad idea in CUDA since it is hard to access them in a coalesced way. If there is a way to flatten your data structure, it will be much easier to work with on the GPU.

Ok i have changed my code, but it still doesn’t work.

Now i use arrays instead of vectors.

My struct looks like this:

const int M = 10;

const int N = 10;

struct Solution{

	float solutions[M];

	float values[N];


And what I do is this here:

Solution *sol = new Solution[2];

Solution *d_solution;

size = 2 * sizeof(Solution);

cudaMalloc((void **) &d_solution, size)

cudaMemcpy(d_solution, sol, size, cudaMemcpyHostToDevice);

__global__ methodName(Solution *d_solution){

//This is the point my code crashes

d_solution[0].values[0] = 1.0;


My code works until the call of the kernel. When I try to access the solution it crashes (without an error).

What I am doing wrong? I’m really confused

Aside from the host/device pointer issue which caused the crash, this is still a bad idea. In fact, it’s exactly the same as the vector version but without the memory leak protection the STL provides. You do not want to go pointer chasing on the GPU. You want to have one big, flat, 1D array, and compute indices into that.

Okay i understand the problem, but how should i solve tis problem?
And my second question, apart from that it is a bad idea, doesn’t arrays of structures work, if in the structures are arrays?
I am new in CUDA and it looks to me that I haven’t understand many things yet :D

You basically have two, 1D arrays. One for ‘solutions’ and the other for ‘values.’ You index into these separately, with the 1D indices computed as if from the original 2D index. The matrix multiplication example in the Programming Guide shows you how to do this. In that case the matrices are allocated as 1D arrays, with computed 2D indices.

As a side point, this is what you should be doing on the CPU anyway - if you want maximum performance. The CPU caches help smooth off the edges, but they can’t do everything. I say this while glaring at the current problem in my inbox, which surrounds something like array[i][j][k].structure, where structure is 264 bytes long…