L1 cache and register usage is not under user control. It’s used, but the back-end device compiler performs the register allocation and the runtime manages the L1 cache. You can however limit the number of registers used via the “-Mcuda=maxregcount:” flag.