I am working on a matrix multiplication project using CUDA. The input data comes in the format like matrix[N][N]. Is there any function to copy this array to GPU memory? I checked functions like cudaMemcpy2D, or cudaMemcpy2DtoArray, but it seems these functions all require you to reshape the 2D array to 1D first. So does anyone know how to just copy this matrix[N][N] without any reshaping between HOST memory and Device memory?
Thank you very much!