Would you please explain the followings?
The maximum number of blocks that can run concurrently on a multiprocessor is 8. The maximum number of threads in a block is 512. Therefore 512*8 = 4096 threads should run concurrently on a multiprocessor. But why it is just 768?
In a CUDA project why .cpp file is included? Why .cu files are not enough?
When we should write extern “C” before a function? Why?
Thanks in advance.