CUDA Streams

Hi everybody,

I’m currently working on an OpenCL powered application.
A year ago I was working with Nvidia CUDA. CUDA
features so called CUDA Streams, which enable
memory transfers and kernel execution to be executed in parallel.
As a variant of the well known DMA transfer.

Now I was asking how the status of this parallelization
is included inside Nvidia’s OpenCL port. In OpenCL the programmer
is able to create multiple command queues for the same device too.
Does the parallel execution works here too?

Thx for any responses

Yes, the parallel execution works here too: read the docs about clEnqueueReadBuffer()/clEnqueueWriteBuffer() and clEnqueueNDRangeKernel() functions, and look carefully into events related arguments - these make it possible to build an arbitrary graph of dependencies between memory transfer and kernel execution operations in your program.

What were streams in CUDA, are now command queues in OpenCL