Hi,
I had a program with 4 possible kernels (one of them was a large GEMM). I expected to see parallel execution of those kernels, but just in some rare cases they weer runing in parallel. Now by decomposing the large gemm in 3 micro GEMM and one large GEMM, I am seeing that in most cased kernels are runinng in parallel.
I expected to see parallel execution of those kernels, but just in some rare cases, they were running in parallel. Now by decomposing the large GEMMM in 3 micro GEMM and one large GEMM, I am seeing that in most cases kernels are running in parallel.
I have two questions.
First, what is the name of the manager or scheduler, or run-time system that allocates resources to a kernel?
Second, which criteria are behind allocating resources to kernels? Why before that it was rare to see parallel execution and now is in almost cases but with dropping in performance? I want to know the process of allocating resources and assigning it to a kernel.
I would be grateful if you could introduce me some document or papers on this matter.