I’m a bit confused about the concept comparison of CUDA and OpenCL.

In openCL, is a “compute unit” physically a multiprocessor? or just a virtual working group of many threads? It seems like the compute unit is identical to a thread block in cuda, but we can choose the block size as we like and decide how many blocks can be launched on one single multiprocessor. I’m not sure whether we can change the compute unit size, i.e. how many processing element in one compute unit? And how many compute unit can be launched on one multiprocessor?

Is a work item a single core or just single thread?

Thanks very much for the reply! And I think I made clear most of my confusion

One thing still is that inside each of the SIMD Engines(Assume to be an MP), there are 16 Thread Processors. And in each of these Thread Processors, there are 5 stream cores. This is quite different with Nvidia GPU in terms of the definition of cores. Is this correct?