since a develop with CUDA an OpenCL at the same time, i had to ask myself the question: which of the given properties and restrictions are actually prespecified by the device hardware, which of them are defined by the driver (and therefore similar for CU and CL), and which of them are only set by the runtime APIs?
the memory size - of course similar in cl and cu
max block size - similar too, but set by driver, or is it a hardware specification?
max grid dimensions - opencl seems not to know something like that, so either the opencl compiler makes the kernel run multiple times in case its > maxgridsize, or it is a constant of the cuda api
Is there an abstract about whats actually specified by hardware, and whats set by the driver/runtime api?
I would also be pretty interested in a detailed abstract about GPU hardware. The colorful pictures about multiprocessors and warps are fine, but somehow sketchy…