Flushing Instruction Cache on GPU

Hello everyone,

Does anyone know how to flush the instruction cache, either using the debugger interface provided by libcuda or through some operation inside the kernel function? I realize it may be impossible, but I want to know if there is a way.

I know I am not supposed to mess with the code on the device, but I want to experiment with modifying cuda code after kernel launch. I did find a way to write to code space, but the instruction cache prevents me from actually executing the modified instructions.

I appreciate any help you can provide

Self-modifying code is not supported and I really don’t think there’s any way for a user to flush the icache…

Well, you could always try to overflow the cache by running a long routine consisting of a million NOP’s or so. That would presumably cause whatever code was in the cache to be evicted, thus causing it to have to reload from your modified instruction space once you actually try to execute that part of the code.

There’s also a small chance that kernel invocations flush the cache. The actual internals of kernel calls would largely determine if this would work or not, since an obvious optimization would be to not flush the cache if it can be avoided.

Thank you guys.

Executing a bunch of NOP’s would work, but the same problem is still valid: How will I insert NOP’s without flushing the instruction cache first? The device will still execute whatever is in the cache, and I don’t want that. NOP’s would only work if I inserted them before launching the kernel. Am I missing something?


How did you even find the global memory location where instructions are stored??? That looks like a good effort! (scanning for instruction patterns in memory???)

Well, I had to read portions of libcuda assembly.

I downloaded the cuda-gdb code from ftp://download.nvidia.com/CUDAOpen64/ . It comes with a header file called cudadebugger.h. There is a function called readCodeMemory declared in this header file, and the actual implementation resides in libcuda.so. This readCodeMemory function eventually calls memcpy to copy the code from device memory to host. I changed the parameters to this memcpy operation, and voila! I copied from host to device, overwriting code on device. Since I could not flush the instruction cache it didn’t really matter much. I tried modifying some code that did not fit in the cache, and the output changed as I expected. So, now I am trying to flush the instruction cache.

See Reverse engineering of the CUDA communication with the driver
I scanned it, and there’s nothing even remotely close to any on-device cache management, which means either hardware or driver manages it. My guess would be that it is completely hardware-managed.

Anyway, you need to look into the driver itself it seems.

Edit: also see this If you haven’t already. It seems to suggest that icache is unified at some point with constant cache, but lacks any details. Maybe it will help you.