Algorithm to remove shared vertices

Hi,
I have been playing around with the marching cubes program in the CUDA SDK Code Samples. As far as I can see, the triangles are created one by one and vertices are not shared between triangles. I need an actual connected mesh, so I am looking for a way to make the triangles share vertices.

Example:
Suppose marching cubes creates a surface with two triangles made from four points, p1, p2, p3, p4
We describe the triangles as:
T = [p1, p2, p3, p1, p3, p4]
and
triangleIndex = [0, 1, 2, 3, 4, 5] where every 3-tuple gives the indices in T to create the triangles.

I want a function that takes input: T, triangleIndex
and gives the following output:
Tnew = [p1, p2, p3, p4]
triangleIndexNew = [0, 1, 2, 0, 2, 4]

Is there an (easy) way of doing this if you don’t want to do the brute force massive iterations over the arrays, which is way to expensive?

This “weld vertices” problem can be solved with a simple application of Thrust’s sort and unique algorithms. Basically, the idea is to sort the the list of triangle vertices to bring equal vertices together, and then use unique to eliminate redundancies. Finally, you find each vertex’s new index by searching the list of vertices in parallel.

We’ve added an example code demonstrating this technique in two dimensions, but it should be straightforward to extend the solution to 3D.

Thank you very much, that was exactly what I was looking for.

This is cool, I thought about trying something like this when I wrote that marching cubes sample, but gave up because it was too complicated!

It’s useful to have have shared vertices if you want to do post-processing on the mesh (e.g. smoothing).

I’d be curious to find out how much time this takes compared to the mesh generation.

So, I picked up this project (mesh smoothing) again yesterday to try some things out, and I ran into memory problems when handling large meshes (1.5M+ vertices), so I have a couple of questions.

  1. I pass my data in as a raw pointer (call it dPositions) and then I do:
thrust::device_ptr<float4> dPtr(dPositions);

  thrust::device_vector<float4> input(dPtr, dPtr+vertexCount);

According to errors I get (bad_alloc) when I run the last line, it seems that the data is copied and new space is allocated. Is this correct? In that case, is it possible to make the vector use the data from the device_ptr without copying it, or should I just use dPtr for the algortihms?

  1. Is the sorting done in-place?

  2. The problem I have is that I cannot seem to allocate vectors over ~10Mb (have 256Mb on device, of which about 110 is used by the OS). Is there any way of “stitching” together multiple positions and then use them so that the algorithms think they they are accessing a piece of continuous memory?

So, I picked up this project (mesh smoothing) again yesterday to try some things out, and I ran into memory problems when handling large meshes (1.5M+ vertices), so I have a couple of questions.

  1. I pass my data in as a raw pointer (call it dPositions) and then I do:
thrust::device_ptr<float4> dPtr(dPositions);

  thrust::device_vector<float4> input(dPtr, dPtr+vertexCount);

According to errors I get (bad_alloc) when I run the last line, it seems that the data is copied and new space is allocated. Is this correct? In that case, is it possible to make the vector use the data from the device_ptr without copying it, or should I just use dPtr for the algortihms?

  1. Is the sorting done in-place?

  2. The problem I have is that I cannot seem to allocate vectors over ~10Mb (have 256Mb on device, of which about 110 is used by the OS). Is there any way of “stitching” together multiple positions and then use them so that the algorithms think they they are accessing a piece of continuous memory?

  1. Yes, creating a new device_vector from a range will allocate new memory and create a copy of the input range. If memory consumption is a concern, you should wrap dPtr with a device_ptr rather than create a new device_vector.

  2. No, the sorts allocate a temporary buffer for “ping-ponging”

  3. Not currently

  1. Yes, creating a new device_vector from a range will allocate new memory and create a copy of the input range. If memory consumption is a concern, you should wrap dPtr with a device_ptr rather than create a new device_vector.

  2. No, the sorts allocate a temporary buffer for “ping-ponging”

  3. Not currently