Organization of threads

Hi. I try to make a program which will run in GPU GeForce 9500 GT.
My problem is on organizing of treads in blocks. If i knew some values of my device and cuda parameters , i could completed the program. The parameter which i know are:

Maximum number of threads per block: 512 (depend on CUDA)
Dimensions of grid: 65535 x 65535 x 1 (depend on CUDA)
Dimensions of blocks: 512 x 512 x 64 (depend on CUDA)
Number of multiprocessors: 4 (depend on device)

But i can not find the below values for my GPU:

  1. Maximum number of blocks which can be executed simultaneously on one multiprocessor.
  2. Maximum number of threads which can be executed simultaneously on one multiprocessor.

Where can i found these values for my GPU? I looked at my GPU specifications in site of CUDA but cant found something.

Also, i want to explain me somebody, what mean exactly “cuda cores”. Each multiprocessor has 8 cores? Each core execute a block each time?

Thanks

The CUDA Programming Guide lists these values in Appendix F.

CUDA cores work together to execute the instructions for an entire warp at once. In your GPU, there are 8 CUDA cores per multiprocessor, so the multiprocessor can complete one warp instruction in 4 clock cycles. Warps are executed in any order from the blocks assigned to that multiprocessor.