Nvprof and visual profiler about memory and cache access again

Hello, I have a Jetson nano. I want to run a deep learning inference program in it and analyze its memory access. The inference program is executed by the python 3 interpreter, which inputs a picture into the built-in neural network of torch, such as resnet50, and outputs the inference result.
Question: I want to know how much memory is accessed by the data from DRAM to L2, then to L1, and then to the kernel when the program is running.
Here is my analysis process:

(1) I plan to use nvprof to analyze the program.
Because it is CC5 3. Therefore, it is not supported to collect the memory accesses from DRAM to L2, as shown in the figure below (reference: https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference-5x))


(2)however, according to the output of nvprof – query metrics, as shown in the following figure:

I can collect gld_transactions, converted to MB (gld_transactions * 4 / 1024 / 1024 (MB)), can it represent the number of bytes read by the kernel from L1 cache?
And L2_ global_ load_ Bytes, converted to MB (l2_global_load_bytes / 1024 / 1024 (MB)), can it represent the number of bytes read by L1 from L2?
(3)Since nvprof’s metrics cannot collect the amount of DRAM memory accessed by L2, I found the visual profiler tool in the previous link, which can analyze the memory flow:
image
However, I found that the article pointed out that the Jetson nano cannot directly use visual Profiler:

(4)So when I execute the program, I use the following statements:
sudo /usr/local/cuda/bin/nvprof -o tf-resnet50.nvvp python3 resnet50-infer.py
Save the analysis results to tf-resnet50.nvvp, use the same version of visual profiler on the PC to tf-resnet50.nvvp analysis yielded the following results:

Does the total bytes shown in the bottom right corner of memcpy (htod) represent the amount of memory accessed by L2 cache from DRAM?If yes, the value corresponding to memcpy (htod) can be collected in Jetson nano using nvprof – which metric in query metrics?Because the data of visual profiler is collected by nvprof.

It can be summarized into three questions:

(1) gld_transactions, converted to MB (gld_transactions * 4 / 1024 / 1024 (MB)), can it represent the number of bytes read by the kernel from L1 cache?

(2)l2_ global_ load_ Bytes, converted to MB (l2_global_load_bytes / 1024 / 1024 (MB)), can it represent the number of bytes read by L1 from L2?

(3) Does memcpy (htod) in visual profiler represent the amount of data read from DRAM by L2 cache? If yes, the value corresponding to memcpy (htod) can be collected in Jetson nano using nvprof – which metric in query metrics?

Because I couldn’t get any answer, I had to ask again.
These questions have bothered me for a long time. If you can answer them patiently, I will be very grateful.

Duplicated with Nvprof and visual profiler about memory and cache access? - Jetson & Embedded Systems / Jetson Nano - NVIDIA Developer Forums

Yes, it was posted by me, but there was no response, so I asked again.

Can you help solve this problem?

Please don’t expect we can support you over the weekend. Thanks

OK, I think so, so I’ll ask you again

So do you have any good suggestions on this issue?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.