I wonder why Bitonic sort is fast than other sort

I’ve heard that in parallel, Bitonic sort is faster than other sorts.

Why is Bitonic sort faster in parallel than other sorts?

all of the fastest GPU sorts I’m familiar with such as in moderngpu, cub, and thrust (before it switched to using cub) use radix sort.