What’s the best way to transfer a large array of data over PCIe to GPU accessible memory on the TX2? Right now, we are transferring the data to host memory, then copying it to device memory, but this seems sub-optimal. I see there’s this for Xavier, but it doesn’t mention TX2.
For this application, once the data is on the GPU, it is transposed, zero-padded and converted from int to float in a kernel (to prep for an FFT) and then discarded. Some headers in the data are accessed by the CPU, but we could move that to another transfer if needed.