Newbie questions

Would you please explain the followings?

The maximum number of blocks that can run concurrently on a multiprocessor is 8. The maximum number of threads in a block is 512. Therefore 512*8 = 4096 threads should run concurrently on a multiprocessor. But why it is just 768?

In a CUDA project why .cpp file is included? Why .cu files are not enough?

When we should write extern “C” before a function? Why?

Thanks in advance.

The maximums quoted are maximum in their own right. i.e. you CANNOT have a block with more than 512 threads.

768 threads limitation comes from the WARP limitation per Multi-processor. There is support ONLY for 24 Warps per multiprocessor. i.e. 24*32 = 768 threads.

No necessity to include .cpp file. Its upto you.

Dont write. Its ok not to write “extern”.

extern “C” as part of a function declaration tells a C++ compiler that the function has C linkage.
As far as I understand this is done to be able to utilize the full C++ feature set on the one hand (by isolating such functions in separate .cpp files), but also be able to link to such code from CUDA (by declaring the functions as extern “C” and thus making them “look” like C functions).

hmmm, but now I am confused when C++ compiler is involved and when NVCC compiler? Would you please tell me the steps of compilation,linking ?

For “cu” files, the developer has to maintain a custom build rule (Right clich the CU, go to properties and give the custom-build command line and the output file) that compiles the CU to OBJ file.

NVCC invokes the C++ compiler for HOST portions of the code. The device portion is compiled to a binary and most probably added to the OBJECT file as a separate section (data-section – as a data-array could be…).

NVCC programmer’s guide shows the various sub-steps of the compilation. It comes with your CUDA installation. Just check c:\cuda\doc or somewhere u had installed…