Belive or Not: SuperCuda is 10^6(+) faster than Mathematica

Here Mathematica shines it is exact, we can use it to reveal the Hidden Symmetry
I shall give them and then explain what every C/CUDA /Numba/OpenACC/Pari/Mathematica … compiler missed.
and then we shall improve the
(base) D:.Nvidia>

                0.00%  74.500us         1  74.500us  74.500us  74.500us  cuLaunchKernel

some people might say this is crazy, nuts. No,No, necessity is the mother of invention. IT IS NEEDED

(fig 1)

(fig 2)

(fig 3)

(fig 4)

(fig 5)

In general Parallelizable Loops admit symmetric group action.

(fig FULL)
This is the full symmetry of the Lemniscate case.

Of course it is related to its intrinsic Z_4 symmetry. The higher order
cases have more symmetries. That is the source of the Lemniscate recursive relations.

By the way an easier Group Theory Book might be

An Invitation to Representation Theory | SpringerLink

ok, now I understand: The Math is not the standard one for you.
At least, I have started writing and when more people will be
interseted, it shall become clearer
Thank you and apologies for being the “bad-guest”
But, There is no royal way for math.
It is claer if some huge computations can be reduced to matter of
seconds (10^12 case will be finished in less than 30 sec) then there
is something to understand. Symmetry reduction is well known and used
in non-linear dynamics/Solitons/Integrable models for many years.
Again, Thank you and many sorries.