Queue Depth for Async Memcpy

How many host-to-device async memcpy’s can be queued up by the host?
Once you reach the limit, will the cudaMemcpyAsync method block? or will it fail?

This topic may be helpful, I think.