Since It’s unclear how bug reporting works with Nvidia these days I will post this here…
The runtime API documentation (https://docs.nvidia.com/cuda/pdf/CUDA_Runtime_API.pdf) could be more clear WRT to cudaMemcpy2DToArrayAsync.
The documenation does not clearly state that any potential wOffset (column offset) should be the number of bytes and not the number of elements:
Compare this to cudaMemcpy2DAsync where it is clearly stated in the text:
Thanks