About the problem of using a stream block to transfer a two-dimensional array in CUDA C

How to use a stream in CUDA C to pass a two-bit array into a kernel function by row partitioning