 I am learning abut latency and throughput and I have some questions:

Example: if we have an application in image processing and I want to use two options:
option A use only my CPU with 1 core and
option B use only my GPU with, let’s say, up to 10,000 cores.

I have a video of colored images and would like to convert the frames in the video to gray-scale.

If it takes the CPU core 1 microsecond (10^-6 sec) to convert one pixel of the image from RGB color to gray scale and
0.5 millisecond (0.5*10-3 sec) for the GPU kernel (using the Map operations we can pass 10,000 pixels to the GPU to process in parallel without affecting per-pixel performance, meaning we can run 10,000 threads in parallel in appropriate sized blocks and grids ),

then is the latency of the CPU 10,000 threads * 10-6 = 0.01 seconds and

the latency of the GPU 0.5 milliseconds(0.0005 seconds) for all 10000 threads since they can run in parallel and each pixel takes 0.5 milliseconds (or 0.0005 seconds)???
Am I missing something?

Throughput of CPU for 10,000 threads is:
10,000 threads / 0.01 = 1,000,000

Throughput of GPU for 10,000 threads is:
10,000 threads /0.0005 seconds = 20,000,000

Right?
Thanks!

latency is the time from when a request is made for a transaction or operation, until that operation completes. this may vary according to conditions, such as other activity that is going on concurrently, etc. It may have some connotation like “worst case”, or “average”, or it may simply be a measurement.

throughput is the number of transactions or operations that complete per unit time, according to some conditions/assumptions, like “max theoretical”, “max realizable”, “average”, etc. or it is simply a measurement.

To answer your question, you need to define the latency you are looking for. The latency to convert a single frame? (How many pixels are in a frame?) You also need to define the throughput metric you are looking for. The througput of frames converted per second? A sensible latency unit might be seconds/frame. A sensible throughput unit might be frames/sec.

For the types of calculations you seem to be outlining, the latency and throughput are probably closely related (reciprocals of each other). Latency and throughput really only communicate different information when you have a pipeline or some other mechanism that has certain start-up characteristics or delays, but once the pipeline is primed, it can create a new result on every clock, for example.

For your example, and assuming there are 10,000 pixels in one frame, with no other information to suggest other architectural delays or issues, the CPU would require (10,000 pixels/frame) * (1us/pixel) = 0.01 s/frame (for latency) for a single thread. The throughput in frames/s would just be the inverse of this, i.e. 100 frames/sec.

The GPU would require (10,000 pixels/frame) * (0.5ms/pixel)= 50 s/frame (for latency) for a single thread. If we have 10000 threads operating in parallel, we divide this by 10000 and get a latency of 0.5ms/frame for 10,000 threads. The throughput (for 10,000 threads) is just the inverse of this i.e. 2000 frames/s

Thank you so much for taking the time to respond!
I really appreciate it.
And I understand most of your points.
But here are more questions.

It seems like we agree on the latency for CPU (0.01)…but this would be seconds per pixel, not second per frame, right?
And also, if throughput is the ratio of number of tasks to the latency,
then would this be 10,000 threads/ 0.01 = 1,000,000 ?

As fot the GPU’s latency what is the latency for 10,000 threads? I think since it takes 0.5 milliseconds to do one thread (one pixel) and since all 10,000 pixels can pass through 10,000 threads in parallel, would this mean that the time it takes for one thread is the same time it takes for 10,000, so would this be 0.5 milliseconds for all 10,000 threads as well? would this be the latency of the GPU?
And again, since throuput is number of tasks/ latency, would it make sense to say 10,000/ 0.5 milliseconds= 20,000,000?

I am so confused!!!

I don’t think the result is 2000 frames /sec as Txbob says.
is it 10000/0.0005 seconds (0.5 ms) which is 2,000,000?
but the number is so big though…

I would disagree that latency and throughput are inverses of each other simply because its a parallel problem and not a serial one.

The latency of a single pixel is simply 0.5ms.
The latency of a single frame is how long it takes to do a single frame, which assuming the 10,000 pixel image and 10,000 computational cores is 10,000 / 10,000 * 0.5s which is 0.5ms.

The throughput of pixels is how many single pixels you can do per second, which 10,000 times the number each thread can do per second which is 1/0.5ms or 1/0.0005 which is 20,000,000 pixels/sec.

The throughput of frames is how many frames you can do per second, which is 20,000,000 / pixels per frame or 10,000 giving you a frame throughput of 2000 frames/sec. This works out simply as the inverse of the latency of a single pixel because you have the same number of pixels as computational cores.

Generalising the problem so that C is the number of cores (which can all run in parallel, this is not number of threads launched), and P is the number of pixels per image and T is the time for a single core to do a single pixel:

The latency of a single pixel is T.
The latency of a single frame is T * P / C where C can never be > P

The throughput of pixels is C * (1 / T)
The throughput of frames is C * (1 / T) / P where C can never be > P

These formulae hold both for the GPU and CPU versions as long as you tweak C and T accordingly.

tl;dr You were right in your first post. Rule of thumb is CPU has lower latency for individual parts, but GPU has higher throughput where parallelisation can occur.

Additional note to remember though is that GPU computing doesn’t quite hit these theoretical formulae due to additional overheads, and because there are often serial components to parallel problems (memory accesses etc).

Thank you Tiomcat!
I agree with the latency and throughput for GPU

as for the CPU, would the latency for one frame be 0.000001 sec and for 10,000 0.01 sec? (10,000 X 0.000001). Is it right to say this?
and the throughput for CPU is 1,000,000 pixels/sec (10,000 threads/0.01)?

Using the formulae, on the cpu the latency of a single pixel is 1 microsecond, and one image of 10,000 pixels would be 10 milliseconds. The throughput of pixels on the CPU is 1 * (1/T), or 1,000,000 pixels/sec as you said :) It sounds like you have got it.