How to copy 2d vectors with variable sized rows to the GPU?

Hello,

I am creating 2d vectors with variable sized row lengths. I am trying variable sized row lengths to save on memory space. I am also trying it as opposed to flattening the entire array because this way I don’t have to keep track of indices where something starts and ends within a flat array.

My code works fine on the host using the standard vector library, but I cannot seem to get this working with thrust. I would like to copy these vectors over to the GPU, but no matter how I go about trying, I keep getting compiler errors. The code I have below is host only code.

std::vector<std::vector <double>> zeroVector;
std::vector<double> zeroRow;

//This creates a 2D array with variable sized rows depending on int rowN
for (int i = 0; i < DOMAINS; i++)
{
	int rowN = h_mesh->domainsN[i];
	for (int j = 0; j < rowN; j++)
	{
		zeroRow.push_back(0);
	}
	zeroVector.push_back(zeroRow);
	zeroRow.clear();
}
	
//Host code equivalent which makes the copy and works
//thrust "equivalent code" with printf from inside of kernel not working compiler error
std::vector<std::vector <double>> tempccOld = zeroVector;
printf("The value of T at domain=1, N=30 = %.2f\n", tempccOld[1][30]); //random indices chosen

All I want is “zeroVector” to be copied over to the GPU as it is exactly. I have many vectors that need to be of this exact size, so if I get one working, the rest will work, and then I can continue on.

Device code won’t work with std::vector. thrust doesn’t work with vectors of vectors, and such arrangements would be difficult to copy to the device without flattening. Since you cannot use either thrust or std::vector to manage such arrangements on the device side, it’s unclear how to respond to your question.

From my perspective, it cannot be done using std::vector. It cannot be done using thrust (vector of vectors). Therefore you must choose some other method. The other methods involve managing your data directly, instead of in containers. If you are going to manage data directly, flattening is the most sensible option, IMO.

I was hoping I could make a 2D array, with each row being a flattened 3D domain. I just wanted to do it to make it easier to track, as well as make it easy to “add more” later on (each new domain added would just become another row). It’s not necessary, but if I do have a request to improve CUDA - it’s to have Thrust supporting 2D and 3D vectors.

I already have code working as you suggested txbob - it’s just one giant flattened array right now. But now I feel like I’m “wasting” computational cycles on mapping functions making sure what I pull out of that array is the correct index. First world problems I guess. :p

Unless your code is proven to be computation limited (the CUDA profiler can help determine that), I would not worry about address computation. The rate of growth in computational resources in GPUs is much higher than the rate of growth in memory bandwidth, so with recent GPU generations many application codes have become bandwidth bound. Using jagged arrays with their resulting irregular access patterns may cost more in terms of performance than additional address computation.