I have to do a set of slicing operations to move data to and from 3D complex float array slices and 2D real array, all allocated contiguously. The operations are of the form (zero indexed pythonic notation).
X[:M, 0, :P].real = B[:,:]
Y[0, :N, :P].real = C[:,:]
D[:,:] = Z[:M, N-1, P:P+T-1].real
E[:,:] = Z[M-1, :N, :T].real
I have no idea how to proceed other than brute force. I am assuming for reading a 3D array, I can use a texture.
I am new to CUDA, so tips at any level would be appreciated. I didn’t expect that this would be the most challenging part of transcribing optimization code from C to CUDA. I hope that the enormous speedup in other parts of the code are not nullified by these operations which I have to do frequently.
(Dont be surprised when array slicing translates into emoticons).