I know if I have a 8 cores CPU, I can create 8 threads. Each core take care of one thread’s job and don’t care about others threads’s job. This is what we call Affinity. I wish to know if I can do something in gpu? If I can. How can I do that?(it will be so helpful since gpu have a lot of cores)
There is no need to set affinity because threads always stay on the same SM (same warp scheduler even) except for very rare events (e.g. calling cudaDeviceSynchronize() on the device, or during debugging).
Yes, there is need for GPU affinity.
Say I use multithreading on host and each (host) thread is launching is own CUDA kernel (or Thrust/CUB call). Without GPU affinity there would be thread contention on the GPU; i.e., different kernels would have to wait to be scheduled on the GPU. Right?
With GPU affinity, on the other hand, I could have something like: run each kernel on a different block. If I were to encapsulate the logic of all these different kernels into a single (bigger) kernel then I’d have thread divergence, wouldn’t I?
I suspect I could mitigate some of these issues with CUDA streams (am I wrong?). But affinity would be a much more powerful abstraction.