What data to enter into Occupancy calculator?

I am not sure what data to enter into Occupancy calculator:

Registers per thread
Shared memory per block

I have NVIDIA GeForce GTX 960M (compute_50,sm_50) with five SP.

while compiling my program:

1>ptxas info : 31 bytes gmem, 152 bytes cmem[3]
1>ptxas info : Compiling entry function ‘_Z8blackcatv’ for ‘sm_50’
1>ptxas info : Function properties for _Z8blackcatv
1> 352 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1>ptxas info : Used 86 registers, 3 bytes smem, 320 bytes cmem[0], 24 bytes cmem[2]

I use just on device function with:

# define blocks 1
# define threads 496

How many blocks and threads I can use?

So enter 86 registers per thread
Enter 3 bytes for shared memory per block

What are your kernel launch parameters and what is the kernel duration?

When I enter:
Compute Capability version 5.0
Threads per block 640
Registers per thread 86
Shared memory per block 3

It gives me:
Maximum Thread Blocks Per Multiprocessor
Limited by Max Warps / Blocks per Multiprocessor 4
Limited by Registers per Multiprocessor 1
Limited by Shared Memory per Multiprocessor 256

So when my GPU has 5 SM can I use 1 * 5 (SM) = 5 Thread Blocks and 640 Block Size (Threads)?

Can I define:

# define blocks 5
# define threads 640

I use just one

__global__ void blackcat(void) {...}

function which can run 1 hour or more.

Yes, that is what that means. You select the lowest limit of blocks per multiprocessor, which in your case is 1.

Again, this is a statement of occupancy. That is how many blocks that can be simultaneously resident on SMs. You can certainly launch a kernel with more blocks than that, but only that many blocks will run “simultaneously”.

