is it possible for a Tesla application to copy data directly from PCIe DAQ card to Tesla without involvement from the host CPU?
May I explain what application I have in mind:
An external data acquisition box attached to PCIe through a cable and adapter generates “samples” at 60kHz. One such sample is about 15kB big and represents a three dimensional data structure of 8x8x60 32bit values; roughly 900MB of data are generated in total. Once 256 such samples are accumulated, the resulting dataset 8x8x60x256 is processed by a series of FFTs and scaling and rearrangements of, most of which can be done in parallel (60, 64, or 256 depending on the operation). This is a continuous process, as one set of processors would be accumulating data, while another set of processors would be processing the accumulated data, alternating the roles.
If the data acquisition was to be performed by the main CPU, the main application processor would possibly be in charge of transferring 15kB into RAM at 60kHz, i.e. 900MB/s, and at the end of the accumulation transfer the full dataset of 256 x 15kB = 3.75MB to a Tesla for processing at 60,000/256 = 234Hz, i.e. again 900MB/s. This requires double the bandwidth, 2 x 900MB/s. It appears to make more sense if a Tesla could be programmed such that it triggers a transfer of the 15kB samples at 60kHz itself onto the device memory, directly from the PCIe DAQ card without involvement of the main CPU or RAM. Is this possible?
If so, how, and if not, would a contemporary motherboard plus quadcore CPU at 3GHz manage all anyway?