CUDA functions performance

Hi all

Where can I find a document with all CUDA functions timing performance?


what you need - latency or throughput? :) do you know SIMT architecture itself?

first, I’m new to CUDA, and I’m trying to check if it’s performance fit my application.
so I want to know how much time it takes for few functions.
I understand that it depends on how many GPU’s I’ll use. I just want to know the timing for one GPU.


basically, single GPU has ~1000 GPU cores running at ~1GHz, so you can perform ~10^12 arithmetic operations per second (i.e. additions, multiplications, shifts and so on). this is often specified as tflops, f.e. gf1080 has 7 tflops or so