Can we have multiple workgroups on one Multiprocessor concept comparison of CUDA and OpenCL

I’m a bit confused about the concept comparison of CUDA and OpenCL.

In openCL, is a “compute unit” physically a multiprocessor? or just a virtual working group of many threads? It seems like the compute unit is identical to a thread block in cuda, but we can choose the block size as we like and decide how many blocks can be launched on one single multiprocessor. I’m not sure whether we can change the compute unit size, i.e. how many processing element in one compute unit? And how many compute unit can be launched on one multiprocessor?

Is a work item a single core or just single thread?


The explanations in this thread will probably help:

Thanks very much for the reply! And I think I made clear most of my confusion

One thing still is that inside each of the SIMD Engines(Assume to be an MP), there are 16 Thread Processors. And in each of these Thread Processors, there are 5 stream cores. This is quite different with Nvidia GPU in terms of the definition of cores. Is this correct?

Nvidia Fermi(GTX 480): 32 MP * 15 Cores = 480 cores
ATI Evergreen(Cypress): 20 SIMD Engines * 16 Thread Processor * 5 Stream Cores = 1600 SIMD Cores

How does Thread Processor of ATI compared to a core in Nvidia devices? and how does Stream Cores of ATI compared to cores of Nvidia devices?


There is no direct mapping between NVIDIA’s CUDA Cores and ATI’s SIMD Streaming Cores. I think this post sums it up quite appropriately.