Logging the trace of memory accesses in the GPU trace logging

amabillis · December 4, 2007, 7:33am

Hello,

I am doing a small research on comparing the memory behavior of CPU process with GPU process. The basic idea is quite simple: same program is built for the CPU execution and the GPU execution with CUDA.

I can easily get the CPU trace by logging the dynamic trace of the CPU process using the debugger. However, as the debugger only stretches within the CPU address space, it seems to me that there is no way to get the GPU trace at all.

Is there any way/tool to log the dynamic CUDA trace?
Any comment would be greatly helpful.

Thank you.

MisterAnderson42 · December 4, 2007, 2:34pm

set the environment variable CUDA_PROFILE to 1 and then run your app. There is more documentation available in the doc/ directory of the toolkit download.

Doesn’t running your application in debug mode cause performance problems though?

amabillis · December 4, 2007, 9:41pm

Thanks for the reply. In fact, I know to get the profile as you showed, however this profiling is too brief to be used as a trace.

I need something more detailed trace like the dynamic trace (or dynamic stream) of the disassembly codes running in the GPU. Or at least more detailed information of how many bytes were being transferred in each CUDA API calls happened in the GPU.

Would there be any way or tool to realize tool?

Thanks again.

MisterAnderson42 · December 4, 2007, 11:07pm

No such tracing tool exists on the GPU. When I perform these kinds of comparisons, I just go through the kernel by hand and count the number of global memory reads and such, then output statistics based on the counted values. But then my kernels perform a very predictable set of memory reads based on their input so this is relatively easy to do.

You can use the performance counters in the 1.1 profiler to get a measure of how many warp reads are performed on a single multiprocessor, but that again lacks byte count information AFAIK. Maybe it counts a float4 read as 4 reads, I’m not sure.

amabillis · December 5, 2007, 1:48am

Thank you very much. I will give it a try. External Image

I really appreciate your help.

seibert · December 5, 2007, 4:34pm

Be sure to read the profiler docs. I think the counts it returns are for just one multiprocessor (or maybe one block), so they are good for relative timing, but extrapolating them to absolute counts for the whole kernel will take some multiplication.

Topic		Replies	Views
Stack memory trace on the GPU CUDA Programming and Performance	2	1824	November 10, 2017
how to collect GPU statistics ? CUDA Programming and Performance	6	4854	May 18, 2008
Debugging CUDA Programming and Performance	2	602	May 5, 2016
How to make a GPU crash reporting? CUDA Programming and Performance	4	1228	March 14, 2019
profiling CPU and CPU of multiple real-time tasks Nsight Eclipse Edition	2	2981	April 26, 2017
GPU Trace library Easily trace vaules from your kernels in device mode! CUDA Programming and Performance	11	18912	October 11, 2009
Comparing GPU vs CPU execution times CUDA Programming and Performance	4	4577	March 17, 2020
Run-time GPU profiling General	3	710	March 8, 2018
CUDA Resource monitor how to monitor what my program is doing! CUDA Programming and Performance	4	12948	December 29, 2009
Compare GPU and CPU function time CUDA Programming and Performance	7	6307	May 30, 2011

Logging the trace of memory accesses in the GPU trace logging

Related topics