OpenCL Dual Copy Engines

Hi,

I have a few quick questions regarding Dual DMAs. NVIDIA published a white paper regarding Dual DMAs in their Quadro’s and I was wondering if that meant they also worked in OpenCL. The paper only mentions CUDA.

I currently am running a Quadro K5100M where I have a project that utilizes Compute and Transfer Overlap but when I profile it, the reads and writes are not overlapping (which would happen with Dual DMAs). I just wanted some clarification whether maybe I’m doing something wrong or it is just not possible in OpenCL.

Thanks!

B Ha