Those are not exclusions. This is configuration data documented to aid in reproducibility. When you say “incoming data stream”, where is that data coming from exactly? If it is coming in over PCIe, you will be limited to 12.5 GB/sec (PCIe gen3 x16 link), assuming large transfer sizes. Keep in mind you will need sufficient GPU memory to store all the FFT data (input, output, FFT temporary storage).
Excluding plan creation from benchmarking data makes sense because most applications do more than a single FFT during their entire run time. Rather, they create a plan once and then run multiple FFTs with that plan. This usage model is really no different from what you would do with FFTW, for example. If you need fast plan creation make sure to use a fast host system (high single-thread performance).
Note that the Tesla P100 for which performance data is provided by NVIDIA at the above link has very high bandwidth, about 10% higher than the Titan V. The GPU bandwidth numbers listed in specifications (720 GB/s for Tesla P100, 653 GB/sec for Titan V) are theoretical numbers based on multiplying signalling speed by interface width; the same applies to bandwidth specifications you find on Intel’s website, for example. In practice you should be able to achieve about 80% of that.