About latency (very basics)

dksellou · March 14, 2014, 3:48pm

I am learning abut latency and throughput and I have some questions:

Example: if we have an application in image processing and I want to use two options:
option A use only my CPU with 1 core and
option B use only my GPU with, let’s say, up to 10,000 cores.

I have a video of colored images and would like to convert the frames in the video to gray-scale.

If it takes the CPU core 1 microsecond (10^-6 sec) to convert one pixel of the image from RGB color to gray scale and
0.5 millisecond (0.5*10-3 sec) for the GPU kernel (using the Map operations we can pass 10,000 pixels to the GPU to process in parallel without affecting per-pixel performance, meaning we can run 10,000 threads in parallel in appropriate sized blocks and grids ),

then is the latency of the CPU 10,000 threads * 10-6 = 0.01 seconds and

the latency of the GPU 0.5 milliseconds(0.0005 seconds) for all 10000 threads since they can run in parallel and each pixel takes 0.5 milliseconds (or 0.0005 seconds)???
Am I missing something?

what about throughput?

Throughput of CPU for 10,000 threads is:
10,000 threads / 0.01 = 1,000,000

Throughput of GPU for 10,000 threads is:
10,000 threads /0.0005 seconds = 20,000,000

Right?
Please help…I don’t know if I am getting these concepts correctly!
Thanks!

Robert_Crovella · March 14, 2014, 5:54pm

latency is the time from when a request is made for a transaction or operation, until that operation completes. this may vary according to conditions, such as other activity that is going on concurrently, etc. It may have some connotation like “worst case”, or “average”, or it may simply be a measurement.

throughput is the number of transactions or operations that complete per unit time, according to some conditions/assumptions, like “max theoretical”, “max realizable”, “average”, etc. or it is simply a measurement.

To answer your question, you need to define the latency you are looking for. The latency to convert a single frame? (How many pixels are in a frame?) You also need to define the throughput metric you are looking for. The througput of frames converted per second? A sensible latency unit might be seconds/frame. A sensible throughput unit might be frames/sec.

For the types of calculations you seem to be outlining, the latency and throughput are probably closely related (reciprocals of each other). Latency and throughput really only communicate different information when you have a pipeline or some other mechanism that has certain start-up characteristics or delays, but once the pipeline is primed, it can create a new result on every clock, for example.

For your example, and assuming there are 10,000 pixels in one frame, with no other information to suggest other architectural delays or issues, the CPU would require (10,000 pixels/frame) * (1us/pixel) = 0.01 s/frame (for latency) for a single thread. The throughput in frames/s would just be the inverse of this, i.e. 100 frames/sec.

The GPU would require (10,000 pixels/frame) * (0.5ms/pixel)= 50 s/frame (for latency) for a single thread. If we have 10000 threads operating in parallel, we divide this by 10000 and get a latency of 0.5ms/frame for 10,000 threads. The throughput (for 10,000 threads) is just the inverse of this i.e. 2000 frames/s

dksellou · March 15, 2014, 3:50am

Thank you so much for taking the time to respond!
I really appreciate it.
And I understand most of your points.
But here are more questions.

It seems like we agree on the latency for CPU (0.01)…but this would be seconds per pixel, not second per frame, right?
And also, if throughput is the ratio of number of tasks to the latency,
then would this be 10,000 threads/ 0.01 = 1,000,000 ?

As fot the GPU’s latency what is the latency for 10,000 threads? I think since it takes 0.5 milliseconds to do one thread (one pixel) and since all 10,000 pixels can pass through 10,000 threads in parallel, would this mean that the time it takes for one thread is the same time it takes for 10,000, so would this be 0.5 milliseconds for all 10,000 threads as well? would this be the latency of the GPU?
And again, since throuput is number of tasks/ latency, would it make sense to say 10,000/ 0.5 milliseconds= 20,000,000?

I am so confused!!!

dksellou · March 18, 2014, 3:38am

Anybody please help!
I don’t think the result is 2000 frames /sec as Txbob says.
is it 10000/0.0005 seconds (0.5 ms) which is 2,000,000?
but the number is so big though…

Tiomat · March 18, 2014, 10:13am

I would disagree that latency and throughput are inverses of each other simply because its a parallel problem and not a serial one.

The latency of a single pixel is simply 0.5ms.
The latency of a single frame is how long it takes to do a single frame, which assuming the 10,000 pixel image and 10,000 computational cores is 10,000 / 10,000 * 0.5s which is 0.5ms.

The throughput of pixels is how many single pixels you can do per second, which 10,000 times the number each thread can do per second which is 1/0.5ms or 1/0.0005 which is 20,000,000 pixels/sec.

The throughput of frames is how many frames you can do per second, which is 20,000,000 / pixels per frame or 10,000 giving you a frame throughput of 2000 frames/sec. This works out simply as the inverse of the latency of a single pixel because you have the same number of pixels as computational cores.

Generalising the problem so that C is the number of cores (which can all run in parallel, this is not number of threads launched), and P is the number of pixels per image and T is the time for a single core to do a single pixel:

The latency of a single pixel is T.
The latency of a single frame is T * P / C where C can never be > P

The throughput of pixels is C * (1 / T)
The throughput of frames is C * (1 / T) / P where C can never be > P

These formulae hold both for the GPU and CPU versions as long as you tweak C and T accordingly.

tl;dr You were right in your first post. Rule of thumb is CPU has lower latency for individual parts, but GPU has higher throughput where parallelisation can occur.

Additional note to remember though is that GPU computing doesn’t quite hit these theoretical formulae due to additional overheads, and because there are often serial components to parallel problems (memory accesses etc).

dksellou · March 19, 2014, 2:52am

Thank you Tiomcat!
I agree with the latency and throughput for GPU

as for the CPU, would the latency for one frame be 0.000001 sec and for 10,000 0.01 sec? (10,000 X 0.000001). Is it right to say this?
and the throughput for CPU is 1,000,000 pixels/sec (10,000 threads/0.01)?

Tiomat · March 20, 2014, 10:10am

Using the formulae, on the cpu the latency of a single pixel is 1 microsecond, and one image of 10,000 pixels would be 10 milliseconds. The throughput of pixels on the CPU is 1 * (1/T), or 1,000,000 pixels/sec as you said :) It sounds like you have got it.

Topic		Replies	Views
GPU Latency CUDA Programming and Performance	0	1393	August 12, 2017
Latency of a GPU implemented algorithm CUDA Programming and Performance	0	375	August 11, 2017
GPU vs. CPU GPU is always much slower CUDA Programming and Performance	1	10280	June 5, 2009
Pipeline Latencies on GPU vs CPU typical CPU pipeline latencies? CUDA Programming and Performance	17	11530	December 7, 2009
The GPU utilization is low CUDA Programming and Performance	3	2033	November 14, 2014
Calculation of averages values of an image sequence CUDA Programming and Performance	4	2971	December 10, 2009
Is GPU worth it? GPU currently too slow. CUDA Programming and Performance	16	6043	December 8, 2008
what can CPU do during GPU is computing? CUDA Programming and Performance	2	1133	June 29, 2012
Measuring speed of a calculation in a single thread CUDA Programming and Performance	6	1130	March 2, 2011
CPU to GPU data transfer latency CUDA Programming and Performance	6	8820	May 4, 2010

About latency (very basics)

Related topics