Hi everybody,
I’m currently working on an OpenCL powered application.
A year ago I was working with Nvidia CUDA. CUDA
features so called CUDA Streams, which enable
memory transfers and kernel execution to be executed in parallel.
As a variant of the well known DMA transfer.
Now I was asking how the status of this parallelization
is included inside Nvidia’s OpenCL port. In OpenCL the programmer
is able to create multiple command queues for the same device too.
Does the parallel execution works here too?
Thx for any responses