Mesh Shader threads beyond wave thread count

Amplification and mesh shaders are an awesome addition to shader programming.
The leverage of a single CPU call (DispatchMesh) being multi-threaded into a set of amplification threads and each of those being further threaded into a set of mesh shader threads is huge.
While the total number of threads per shader is limited 128 it appears that their is no way to use more than the hardware supported number of threads per wave.
With Nvidia cards this is currently 32.
While 32 * 32 is incredible (also difficult to fully utilize) it seems that not being able to push these to a full 128 is leaving out more optimization.

First, am I correct that the shaders are limited to the threads per wave?
Note that I see SV_DispatchThreadId is unstable beyond the threads per wave.

Is the 128 just future hardware support already built in to the software?

Any insight on these would helpful.