I am doing a small research on comparing the memory behavior of CPU process with GPU process. The basic idea is quite simple: same program is built for the CPU execution and the GPU execution with CUDA.
I can easily get the CPU trace by logging the dynamic trace of the CPU process using the debugger. However, as the debugger only stretches within the CPU address space, it seems to me that there is no way to get the GPU trace at all.
Is there any way/tool to log the dynamic CUDA trace?
Any comment would be greatly helpful.