How to Modify a Sequential Algorithm to Expose Parallelism?

Ziqi · August 15, 2024, 8:04pm

I currently have an algorithm running on CPU, which I want to modify for GPU implementation. The problem statement goes as follows: I have a sequence of polygons and a sequence of vertices extracted from the polygons. Each vertex has its neighbors within the polygon it belong to. The vertices are all adjusted in a loop. In each iteration of the loop, after adjustment, the vertices are checked one by one to avoid overlap. In the current CPU implementation, the vertices are checked sequentially, and if an overlap is identified between this vertex and its next neighbor, some delta is added to the current vertex.

This algorithm is sequential in nature, as processing of each vertex is highly dependent on what has been done before. In my understanding, to replicate this algorithm in parallel on GPU is impossible. Then my next question is, how to modify a sequence of vertices (objects, in general) slightly in parallel so that there is no overlap. Any advice from algorithmic perspective?

njuffa · August 15, 2024, 8:44pm

I admit I don’t really understand what this algorithm is trying to accomplish. I think this could be an X-Y problem, that is, a description of one particular solution, instead of a description of the task that one is actually trying to accomplish.

Example: X = What is the best way to build a natural gas pipeline from North America to Europe? Y = Task: Move natural gas from North America to Europe. Looking at Y then leads to the solution of using a fleet of LNG tanker ships instead of building a pipeline.

One way of approaching such issues is to identify the name for the class of algorithms the algorithm belongs to, then search on the name plus “GPU”.

Curefab · August 15, 2024, 9:00pm

How many vertices does one polygon have? Do you have many polygons? Are those vertices just updated once or in 1000s of iterations?

Is it some kind of active contour / snake like algorithm?

Perhaps one thread or one warp processes one polygon. If just one thread, then the algorithm is not different as it is on CPU. The parallelization would come from the sequence of polygons. Or are the polygons (and not only the vertices) sequentially dependent on each other?

Ziqi · August 15, 2024, 9:04pm

More background. In OPC (optical proximity correction), optical mask has to be adjusted so that after propagation of light through mask onto the wafer, the pattern is similar to chip design. Mask geometry has to be updated in rounds of modification to find the optimal boundary condition for light propagation. During adjustment of mask, we have to avoid the situation where two neighboring vertices overlap with each other. Formally speaking, I am seeking a parallel algorithm that does the following: given a sequence s of mathematical objects, what is the optimal way to adjust this sequence so that for any two indices i and j, if i is not equal to j, then s(i) is not equal to s(j).

njuffa · August 16, 2024, 12:30am

Google Scholar returns > 300 hits when I search for “optical proximity correction” + “GPU”. A lot of the newer papers seem to involve machine learning techniques, but there are also > 60 papers that date to before 2014 that do not use ML. Was there nothing useful in these publications applicable to the problem at hand, or at least bits of information that motivate possible oath to a solution?

Curefab · August 16, 2024, 7:44am

Why are you in search of a parallel algorithm? I am sure, because of speed. Are there many masks to do in parallel? Many vertices per mask? Or many rounds of modification? Or do you need low latency to fulfill real-time requirements?

Ziqi · August 16, 2024, 3:31pm

Yes. We are trying to minimize the time of OPC for real use cases with millions of vertices or more.

Curefab · August 16, 2024, 3:46pm

If it is many millions of vertices per mask, you should try to find a way to update them in a parallel way. E.g. update each vertix independently and afterwards looking for conflicts and revert those positions to the previous state before the update. And then look again, if reverting produced additional conflicts. Then you can port the parallel algorithm to GPU easier.

Robert_Crovella · August 16, 2024, 8:13pm

It might be something that could be accomplished with a scan operation.

Suppose I represented vertices as 1 elements in a vector, and “empty space” as zero. Let’s declare that vertices are “too close” if they are adjacent, i.e, in the vector we have … 1 1 …

I first do a adjacent-difference type operation to determine if a particular vertex is too close to its neighbor.

Then, in a most basic realization, I do a scan on the output of the adjacent difference, to figure out how much to move each vertex. This will have the side-effect of moving things unnecessarily.

To resolve that, you could adjust your scan to increase value if a move is indicated, and decrease value if a move is not indicated.

Topic		Replies	Views
A (not so) hypothetical question CUDA Programming and Performance	6	1642	March 24, 2009
Parallelizing feedback loops CUDA Programming and Performance	7	5218	March 11, 2009
The most important parallel algorithms to teach students Teaching and Curriculum Support	3	3044	May 10, 2013
Thinking Parallel, Part I: Collision Detection on the GPU Technical Blog	1	490	April 23, 2014
My CUDA Programming Lecture and Teaching of Poisson Parallel Surface reconstruction in a Summer Scho Teaching and Curriculum Support	10	9059	March 27, 2023
My own graphics pipeline with CUDA Is it possible? CUDA Programming and Performance	16	8611	August 1, 2009
Map algorithms from CPU-GPU: recursive ans stack CUDA Programming and Performance	15	14344	August 12, 2008
Huge divergence with fitting algorithm CUDA Programming and Performance	3	520	July 13, 2016
CUDA version of visiting all subset combinations CUDA Programming and Performance	8	2770	July 9, 2013
How can I parallelize this? CUDA Programming and Performance	14	8325	April 8, 2009

How to Modify a Sequential Algorithm to Expose Parallelism?

Related topics