is an exception to 1). It has only recently been documented by Nvidia, so most lectures predate this.
Actually I think only host->device copies of 64kB or less are asynchronous, as asynchronous device->host copy would violate the API and make a lot of programs fail that would need extra synchronization.