Why is there no `cudaMallocArrayAsync`?

There must be something special about it. If there were not, the CUDA designers presumably would have just used cudaMalloc for underlying array allocations.

I’m not aware of any method to provide an underlying allocation for a cudaArray other than cudaMallocArray and its similar cousins.

AFAIK the reason why is not documented by NVIDIA.

If you’d like to see a change in CUDA or its documentation, one possibility is to file a bug.