As we know, GTX1070 contains 1920 cuda cores and 15 streaming multiprocessors. Each SM has 128 cuda cores. However, according to the ‘CUDA_C_Programming_Guide’ by NVIDIA, the maximum number of resident threads per multiprocessor should be 2048.
Does it mean that one cuda core contains 16 resident threads, so cuda core is like 16 SPs combined?
If so, is the communication between the threads of different cuda cores, different from that between the threads of same cuda cores?