what I observe on CUDA 11.4 is that shortly after the image size exceeds 2GB, things stop working. For the code presented, the image size will exceed 2GB when height
is larger than 347040. It doesn’t fail at 347041, but by 350000 it is failing.
You may wish to file a bug and as a workaround, use image sizes of 2GB or less.