Software GPU with SIMT/SoA Wavefront Execution and AI‑Driven Topology Shaders

Greetings developers,

About a month ago I started developing — and I’m now around 95% done — a software GPU written in C++ (one of four related projects compiled through Emscripten).
It implements a SIMT / SoA / wavefront‑style architecture, somewhat ARM‑like, and it naturally ended up traversing tiles in Hilbert or Morton order… simply because I enforced strict rules in the pipeline:
• no NaNs
• no branching (if)
• no undefined behavior
• no shortcuts

Originally, my goal was to target WebGL2.
But as the project grew, I started adding my own “extensions”:
• MIPMAP generation using Catmull‑Rom filtering
• a TCS + TES stage (tessellation control / evaluation)
• a GS stage (geometry shader)
• and even experimental “brain‑vs” and “brain‑fs” stages — essentially AI‑driven vertex and fragment shaders.

One important detail: in this architecture, the shaders themselves describe the topology.
Each stage can emit, subdivide, or restructure primitives, so the pipeline doesn’t assume a fixed primitive layout.

At this point the system behaves like a miniature programmable GPU with custom pipeline stages and dynamic topology generation.

I’m now looking for expert feedback on the architectural choices, potential performance issues, and best practices for this kind of programmable software GPU. Any insights from experienced GPU developers would be greatly appreciated.

— Manuel
C++ / GPU architecture enthusiast