CUDA is not SIMD. It is SIMT.
explicit synchronization is required in cases where threads cooperate
There is no way to force SIMD without using the methodology covered in the blog. The methodology of compiling for cc6.0 (whatever it may be) will disappear when cc6.0 support is dropped.
The way to request changes to CUDA or CUDA documentation is to file a bug.