Sending large number of vertices to the kernel

mischan · February 8, 2009, 5:14pm

Hi everyone,

I am trying to send a very large number of vertices to the kernel. The kernel will do some computation on these, and send the result of the computations back. The vertices themselves will not be modified.

I know this is a simple task, but I’m a bit lost.

Should I store these n vertices as a vector of vectors with std::vector class? How would I then pass such a thing to the kernel?

Or should I allocate the space with cudaMalloc3D or cudaMalloc3DArray, then fill in that block with the vertices created by float3 vertices = make_float3(x,y,z) ? Where would cudaMemcpy come in?

Or should I use vertex buffer objects?

Or some other technique that I haven’t mentioned, are any of these legit? What is the best way to do this?

Thanks so much!

seibert · February 8, 2009, 9:57pm

Here’s a simple example of how you could do it:

int num_vertices;

// set num_vertices to however many vertices you have

size_t memsize = sizeof(float) * vertices;

float *h_x = (float *) malloc(memsize);

float *h_y = (float *) malloc(memsize);

float *h_z = (float *) malloc(memsize);

for (int i=0; i < num_vertices; i++) {

  // fill in entries of h_x, h_y, h_z

}

float *d_x, *d_y, *d_z;

CUDA_SAFE_CALL(cudaMalloc((void**) &d_x, memsize));

CUDA_SAFE_CALL(cudaMalloc((void**) &d_y, memsize));

CUDA_SAFE_CALL(cudaMalloc((void**) &d_y, memsize));

CUDA_SAFE_CALL(cudaMemcpy(d_x, h_x, memsize, cudaMemcpyHostToDevice));

CUDA_SAFE_CALL(cudaMemcpy(d_y, h_y, memsize, cudaMemcpyHostToDevice));

CUDA_SAFE_CALL(cudaMemcpy(d_z, h_z, memsize, cudaMemcpyHostToDevice));

// Now call your kernel passing in d_x, d_y, d_z

(I’ve used the CUDA_SAFE_CALL macro from cutil.h in the SDK (common/inc/cutil.h), which only checks for error codes if _DEBUG is #defined. Roll in your own error checking if required.)

Regarding the options you mention:

Passing C++ objects (especially STL) to the GPU generally does not work. You will be much more successful with plain C-style arrays and simple structs.
cudaMalloc3D() and cudaMalloc3DArray() are for allocating 3 dimensional arrays, not arrays of 3d objects (like vertices). cudaMalloc3D() just allocates a big linear chunk of memory anyway, but pads out some dimensions if they would cause alignment problems. cudaMalloc3DArray() allocates a special cudaArray object, which is required for 3D textures.
Creating a C-style array of float3 objects is on the right track, and would certainly work. The only problem with float3 objects is that they are the wrong size for coalesced memory access on older devices (GeForce 8 and 9) which can only coalesce reads of size 32, 64 and 128 bits. You can access an array of float3 in a coalesced way using shared memory as a staging area, but this is a little complicated when you are starting out. Making an array for each x,y,z component is a simple way to fix this problem.
Vertex buffer objects are a DirectX/OpenGL concept, and don’t exist in CUDA except for the purposes of graphic interoperability (like writing a CUDA kernel whose output is going to be directly rendered).

Topic		Replies	Views
Transferring multidimensional arrays to GPU CUDA Programming and Performance	1	4525	May 15, 2009
Sending std::vector to kernel CUDA Programming and Performance	1	1570	March 6, 2009
Question about working with vectors CUDA Programming and Performance	6	1107	July 22, 2010
passing vector incl. buffer to the kernel CUDA Programming and Performance	3	905	February 22, 2013
draw vertices from cudaMalloc() memory CUDA Programming and Performance	0	918	February 1, 2009
Passing a multidimensional array to kernel how to allocate space in host and pass to device? CUDA Programming and Performance	12	16224	November 22, 2014
Passing an array of structure to kernel CUDA Programming and Performance kernel	6	2405	April 27, 2020
Passing structures into CUDA kernels CUDA Programming and Performance	9	20340	November 19, 2020
cudaMemcpy3D for allocating arrays for textures only? CUDA Programming and Performance	3	1149	March 2, 2011
Memcpy thrust vector to cuda mapped opengl vertex buffer? CUDA Programming and Performance	3	1179	June 29, 2017

Sending large number of vertices to the kernel

Related topics