Belive or Not: SuperCuda is 10^6(+) faster than Mathematica

Here Mathematica shines it is exact, we can use it to reveal the Hidden Symmetry
I shall give them and then explain what every C/CUDA /Numba/OpenACC/Pari/Mathematica … compiler missed.
and then we shall improve the
(base) D:.Nvidia>

                0.00%  74.500us         1  74.500us  74.500us  74.500us  cuLaunchKernel

some people might say this is crazy, nuts. No,No, necessity is the mother of invention. IT IS NEEDED

In general Parallelizable Loops admit symmetric group action.

This is the full symmetry of the Lemniscate case.

Of course it is related to its intrinsic Z_4 symmetry. The higher order
cases have more symmetries. That is the source of the Lemniscate recursive relations.

By the way an easier Group Theory Book might be

An Invitation to Representation Theory | SpringerLink

