Partition your work into kernels that are short enough to keep the screen (kind of) responsive, but long enough to not load the CPU significantly.
And then, if you are on Windows using the WDDM driver, call cudaStreamQuery(0) between kernel launches to prevent them from being batched up again, so that the OS gets access to the GPU in between.