What data to enter into Occupancy calculator?

I am not sure what data to enter into Occupancy calculator:

Registers per thread
Shared memory per block

I have NVIDIA GeForce GTX 960M (compute_50,sm_50) with five SP.

while compiling my program:

1>ptxas info : 31 bytes gmem, 152 bytes cmem[3]
1>ptxas info : Compiling entry function ‘_Z8blackcatv’ for ‘sm_50’
1>ptxas info : Function properties for _Z8blackcatv
1> 352 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
1>ptxas info : Used 86 registers, 3 bytes smem, 320 bytes cmem[0], 24 bytes cmem[2]
1>kernel.cu

I use just on device function with:

# define blocks 1
# define threads 496

How many blocks and threads I can use?

So enter 86 registers per thread
Enter 3 bytes for shared memory per block

What are your kernel launch parameters and what is the kernel duration?

When I enter:
Compute Capability version 5.0
Threads per block 640
Registers per thread 86
Shared memory per block 3

It gives me:
Maximum Thread Blocks Per Multiprocessor
Limited by Max Warps / Blocks per Multiprocessor 4
Limited by Registers per Multiprocessor 1
Limited by Shared Memory per Multiprocessor 256

So when my GPU has 5 SM can I use 1 * 5 (SM) = 5 Thread Blocks and 640 Block Size (Threads)?

Can I define:

# define blocks 5
# define threads 640

I use just one

__global__ void blackcat(void) {...}

function which can run 1 hour or more.

Yes, that is what that means. You select the lowest limit of blocks per multiprocessor, which in your case is 1.

Again, this is a statement of occupancy. That is how many blocks that can be simultaneously resident on SMs. You can certainly launch a kernel with more blocks than that, but only that many blocks will run “simultaneously”.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.