I have read a paper that claims to transfer data between GPU kernels. http://pdcc.ntu.edu.sg/xtra/paper/2016Pub/GPL.pdf
In the Appendix A, it claims that nvidia GPUs is able to “pass data directly from one kernel to another via Direct Data Transfer (DDT)”.
I am not able to find such a technique described on Nvidia website. Is there an official document for this technique?
“As in case of the AMD GPU, the latest GPUs from NVIDIA (Fermi or Kepler architectures) have also been enabled with concurrent kernel execution capability so that multiple kernels can be executed on the same GPU simultaneously. They also have the ability to pass data directly from one kernel to another via Direct Data Transfer DDT) ”
I have no idea what the authors could mean by DDT here, it is their own private nomenclature best I can tell. The normal way for kernels to “communicate” is that first kernel deposits data into global memory, second kernel picks up the data from global memory, pointers are passed to each kernel indicating where that data is located. Have you tried looking up the cite reference , which presumably introduces the term DDT?
 Z. Chen, J. Xu, J. Tang, K. Kwiat, and C. Kamhoua, “G-storm: GPU-enabled high-throughput online data processing in storm.” In Big Data (Big Data), 2015 IEEE International Conference on, pages 307 - 312, Oct 2015.
Thanks for your reply. I have searched the “DDT” and “Direct Data Transfer” in , but I cannot find any information about it. Maybe the reference is wrong, and in the Appendix A.1 the author has also performed the experiments on kernel communication on nvidia GPUs.
DDT is not any official part of OpenCL terminology that I am aware of, but since the originally linked paper mentions pipes, they may be referring to that on the AMD GPU side.
Since NVIDIA GPUs don’t officially support OpenCL 2.0 (and therefore pipes), the underlying kernel data transfer mechanism proposed must be different, and this is supported by a comment in the appendix:
“Unlike the AMD GPU, the NVIDIA GPU do not need users to set the packet size.”
The packet size is a pipe characteristic.
Therefore the kernel data transfer being proposed (“DDT”) on NVIDIA GPUs is almost certainly using an explicitly managed global buffer technique. Unfortunately further detail here probably depends on reference  in the paper, which seems to be an IEEE paper that I cannot find publicly. As a guess, DDT may be nomenclature devised within that paper on “G storm”, to cover kernel data transfer, whose underlying implementation varies somewhat between AMD and NVIDIA GPUs.
@txbob: You should have access to all IEEE publications through your employer (site license). Just visit the IEEE digital library from a computer at work. I have a limited digital subscription myself and will check whether it covers reference . Note that OP stated that DDT is not actually defined in reference  which they already consulted.
[Later:] Turns out the G-storm paper is covered by my limited personal subscription to IEEE’s digital library. According to the paper, G-Storm is a stream processing system built on top of JCuda. As OP already mentioned, there doesn’t seem to be anything like Direct Data Transfer (by name or concept) described in this paper. In fact, the system involves host/device copies: