Why is Bitonic sort faster in parallel than other sorts?

all of the fastest GPU sorts I’m familiar with such as in moderngpu, cub, and thrust (before it switched to using cub) use radix sort.