As I understand, here the threads are the same concept as in the CPU/multi-threads sense, while GPUs normally have a large number of parallel threads.
They are similar, just be aware of the differences too. With CPU threads, it’s common to start as many “worker” threads as you have processors, and allow them to pull work jobs from a priority queue. They’ll do a job, and when they finish they’ll start another job. Starting 16 threads on a 16 core CPU is a good model. Starting a million CPU threads is not a good idea, that can slow the CPU down and consume too much memory.
With GPU threads, you shouldn’t do that. CPU threads have a (relatively) larger cost to create and destroy the thread itself, where GPU threads have almost no cost. With GPU threads, it’s more common to designate the number of threads as the number of separate things you need to compute and save to memory, so it’s common to start kernels that have a million or even a billion threads.
How do I determine the amount of threads that I need?
Normally, it’s the number of individual things you need to compute, e.g., the size of your output array. In a ray tracer, the most common setup is to compute a single color value per pixel, so you start (width*height) threads where width and height are your image resolution.
With CPU threads and a work queue, it’s very common to have different sized chunks of work for each thread. With GPU threads, the single most important performance factor is making sure all threads do the same amount of work, as much as possible. The way a GPU works is that threads are grouped together and execute together - a GPU is (mostly) a SIMD machine, so you should imagine that it’s executing the same instruction for 32 threads at a time, because that’s what is usually happening. If any one of your threads does something different, or computes something unique, all the other 31 threads usually have to wait around for the unique/slow one to finish. You’ll read about thread “divergence” - this is what that term is talking about, threads taking different paths in the code causing delays because they have to wait for each other.
Do more threads help accelerate the ray tracer?
Yes, up to a point. Today’s GPUs have thousands of thread cores. If your per-thread workload is normal and small-ish, you need hundreds of thousands of threads to saturate the GPU and get the maximum throughput performance. Once the GPU is saturated, then more threads stops helping, but it never hurts unless you have other factors like too much memory usage. If you have threads that do a lot of compute, then a billion of them won’t be any faster per-thread than a million of them. Bottom line is you can’t have too many threads, but you can have too few.
Is there a cost for assigning many threads?
Basically no. The main cost is whatever memory you use and/or computation you do per-thread. You might be able to speed things up by combining threads together that share memory lookups, or share computation, as long as your threads are still saturating the GPU and doing similar sized amounts of work.
But the ray tracer should work even on a single thread, if it is properly setup.
While this is true, a single threaded GPU program is usually far worse performance than a single threaded CPU program. It will go much much slower. In order to use a GPU effectively, you should be using a lot of small bite-sized threads that each do as much uniformly similar work as possible.
–
David.