GTX 1080 - Cuda core architecture

Hey to all,

My colleagues and I conducted research that uses the GTX 1080 in order to accelerate internet packets.
I want to formulate a theoretical speedup, however, I can’t find official details of Nvidia’s Cuda core.
I can’t find the answer for how many integer operations can be processed in one cycle over a single CUDA core.

If someone can post a link for a detailed explanation about Nvidia’s pascal Cuda Cores I would appreciate.

Thanks in advance for your help

table 5.4.1. Arithmetic Instructions of the CUDA programming guide gives theoretical peak values of number of results per clock cycle per multiprocessor for different CUDA architectures (7.0 is Pascal) and for different arithmetic instruction categories.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#maximize-instruction-throughput

This is theoretical peak throughput and hard to achieve in practice.

Thank you, This is exactly what I have looked for.