Hi, I am recently doing research on work balancing on GPUs. One goal is to assign different tasks (kernels/blocks/warps/others) to different SMs and let the SM’s different units work concurrently (FP32, INT32, LSU…).
GPU has a inner scheduler that schedules blocks to different SMs and schedule different warps to different SMSPs. Is there any way to pin a block to a specific SM? Or any other methods to bypass the GPU block scheduler?