When querying my device for the CL_DEVICE_MAX_COMPUTE_UNITS, I got what I find to be a strange reply. The returned value was in terms of SMs rather than actual CUDA cores. In my device (GT8600) there’re 4 SMs each built from 8 CUDA cores. I expected to get 32, but the reply was 4.
Is this the expected behavior? Can someone confirm?
That should be fine. Per definition, a compute unit manages a single work-group and that matches an SM in NVidia’s architecture. __local memory is shared among all work-items of a work-group and that matches an NVidia SM as well.
That should be fine. Per definition, a compute unit manages a single work-group and that matches an SM in NVidia’s architecture. __local memory is shared among all work-items of a work-group and that matches an NVidia SM as well.