CUDA documentation error asynchronous kernel launch

Users are constantly asking why their kernel calls do not appear to be asynchronous.
Some knowledgeable user will respond that this is because they are Windows (7?) users and their kernel calls are being batched.
The CUDA documentation should point out this issue and give the workaround.