Are there any strategies for mitigating the slowdown due to dynamic parallelism overhead? I’m currently seeing slow downs of 20x to over 100x just launching a single empty do nothing thread from my host launched kernels.
Are there any strategies for mitigating the slowdown due to dynamic parallelism overhead? I’m currently seeing slow downs of 20x to over 100x just launching a single empty do nothing thread from my host launched kernels.