I’ve been trying to build a multi command-queue application for the past few days and haven’t figured how to do it. The goal is simply to process mem transfers while a kernel is executing to gain efficiency.
Here is what I did:
- I create two command queues in the same context.
- I queue a kernel execution into the first and a non blocking (device to host) mem transfer into the second.
- Using OpenCL visual profiler, I clearly see my 2 streams, but they don’t run in parallel.
Anyone has succeeded into hiding transfer time ? Is there a optional parameter that need to be activated ? Do I need to queue kernel command and transfer command from two different PC threads ?
I’d appreciate some help…