How to Modify a Sequential Algorithm to Expose Parallelism?

I currently have an algorithm running on CPU, which I want to modify for GPU implementation. The problem statement goes as follows: I have a sequence of polygons and a sequence of vertices extracted from the polygons. Each vertex has its neighbors within the polygon it belong to. The vertices are all adjusted in a loop. In each iteration of the loop, after adjustment, the vertices are checked one by one to avoid overlap. In the current CPU implementation, the vertices are checked sequentially, and if an overlap is identified between this vertex and its next neighbor, some delta is added to the current vertex.

This algorithm is sequential in nature, as processing of each vertex is highly dependent on what has been done before. In my understanding, to replicate this algorithm in parallel on GPU is impossible. Then my next question is, how to modify a sequence of vertices (objects, in general) slightly in parallel so that there is no overlap. Any advice from algorithmic perspective?

I admit I don’t really understand what this algorithm is trying to accomplish. I think this could be an X-Y problem, that is, a description of one particular solution, instead of a description of the task that one is actually trying to accomplish.

Example: X = What is the best way to build a natural gas pipeline from North America to Europe? Y = Task: Move natural gas from North America to Europe. Looking at Y then leads to the solution of using a fleet of LNG tanker ships instead of building a pipeline.

One way of approaching such issues is to identify the name for the class of algorithms the algorithm belongs to, then search on the name plus “GPU”.

How many vertices does one polygon have? Do you have many polygons? Are those vertices just updated once or in 1000s of iterations?

Is it some kind of active contour / snake like algorithm?

Perhaps one thread or one warp processes one polygon. If just one thread, then the algorithm is not different as it is on CPU. The parallelization would come from the sequence of polygons. Or are the polygons (and not only the vertices) sequentially dependent on each other?

More background. In OPC (optical proximity correction), optical mask has to be adjusted so that after propagation of light through mask onto the wafer, the pattern is similar to chip design. Mask geometry has to be updated in rounds of modification to find the optimal boundary condition for light propagation. During adjustment of mask, we have to avoid the situation where two neighboring vertices overlap with each other. Formally speaking, I am seeking a parallel algorithm that does the following: given a sequence s of mathematical objects, what is the optimal way to adjust this sequence so that for any two indices i and j, if i is not equal to j, then s(i) is not equal to s(j).

Google Scholar returns > 300 hits when I search for “optical proximity correction” + “GPU”. A lot of the newer papers seem to involve machine learning techniques, but there are also > 60 papers that date to before 2014 that do not use ML. Was there nothing useful in these publications applicable to the problem at hand, or at least bits of information that motivate possible oath to a solution?

Why are you in search of a parallel algorithm? I am sure, because of speed. Are there many masks to do in parallel? Many vertices per mask? Or many rounds of modification? Or do you need low latency to fulfill real-time requirements?

Yes. We are trying to minimize the time of OPC for real use cases with millions of vertices or more.

If it is many millions of vertices per mask, you should try to find a way to update them in a parallel way. E.g. update each vertix independently and afterwards looking for conflicts and revert those positions to the previous state before the update. And then look again, if reverting produced additional conflicts. Then you can port the parallel algorithm to GPU easier.

It might be something that could be accomplished with a scan operation.

Suppose I represented vertices as 1 elements in a vector, and “empty space” as zero. Let’s declare that vertices are “too close” if they are adjacent, i.e, in the vector we have … 1 1 …

I first do a adjacent-difference type operation to determine if a particular vertex is too close to its neighbor.

Then, in a most basic realization, I do a scan on the output of the adjacent difference, to figure out how much to move each vertex. This will have the side-effect of moving things unnecessarily.

To resolve that, you could adjust your scan to increase value if a move is indicated, and decrease value if a move is not indicated.