This issue applies for 346* and 352* for linux x86_64. You can see two of my discussions on other websites here (the first one is more detailed):
Essentially, my software, which absolutely should execute in parallel w.r.t. clEnqueue* calls on different (device exclusive) CommandQueues according to page 25 of the OpenCL 1.1 spec:
" It is possible to associ ate multiple queues with a single context. These queues run concurrently and independently with no explicit mechanisms within OpenCL to synchronize between them. "
is not doing that at all, regardless of whether I specify in, or out-of-order execution (that shouldn’t matter anyway). I did confirm that CUDA applications can run concurrently (see first post, near bottom of first page).
I would appreciate some NVIDIA support on this, as it seems to imply that a fundamental component of the OpenCL spec is not working as advertised (if this is not a problem on my end, this is a serious issue). The drivers are advertised as supporting OpenCL 1.1 (and now 1.2), and they should probably do what they claim to…
Additionally, if someone here is running 2+ AMD cards, I would appreciate it greatly if you could confirm that my code runs concurrently with either textual or graphic profiler output. You can clone my repo here:
please checkout the simple_events branch only (as it is how I would write a typical OpenCL application).