Weird memory write bytes reported by nv-nsight-cu-cli

yd11130055p1 · January 5, 2021, 4:20am

I modified the sample 0_Simple/vectorAdd to add 5,000,000 elements.
However, nsight compute reports somewhat less dram write bytes. That is, it should give 20MB, while it reports 17MB.

With this command:

nv-nsight-cu-cli --sampling-interval 0 --metric dram__bytes_read.sum,dram__bytes_write.sum ./vectorAdd

I get:

[Vector addition of 5000000 elements]
==PROF== Connected to process 61664 (/scale/cal/home/jungwk/NVIDIA_CUDA-11.2_Samples/0_Simple/vectorAdd/vectorAdd)
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 19532 blocks of 256 threads
==PROF== Profiling “vectorAdd” - 1: 0%…50%…100% - 1 pass
Copy output data from the CUDA device to the host memory
Test PASSED
Done
==PROF== Disconnected from process 61664
[61664] vectorAdd@127.0.0.1
vectorAdd(float const*, float const*, float*, int), 2021-Jan-05 13:14:30, Context 1, Stream 7
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
dram__bytes_read.sum Mbyte 40.00
dram__bytes_write.sum Mbyte 17.00
---------------------------------------------------------------------- --------------- ------------------------------

Why it is slightly less than 20MB? From more experiments I figured it out that, while the read amount is sane, about 3MB is always lost in dram write bytes, regardless of the size.
is it a normal behavior?

I tested under Tesla V100 16GB with nsight compute Version 2020.3.0.0 (build 29307467)

Topic		Replies	Views
Dram_write_bytes result on P100 CUDA Programming and Performance	0	429	July 13, 2020
Consistency of data collected by nvprof and nsight compute Nsight Compute	2	502	July 30, 2023
About DRAM stats Nsight Compute	6	1353	February 21, 2020
"nvprof -m dram_read_bytes" has strange error? Visual Profiler and nvprof	1	1144	July 17, 2019
Dram__bytes_read.sum is !(n/a) Nsight Compute	1	250	March 6, 2025
Nsight returning incorrect results Nsight Compute	4	654	August 20, 2019
How to compute dram__bytes_read.sum & dram__bytes_read.sum Nsight Compute	1	592	September 30, 2024
Very confused about the number of bytes stored to HBM Nsight Compute	0	548	October 14, 2021
DRAM throughput of 3080 Nsight Compute	0	502	June 26, 2021
How to get dram throughput in Nsight system？ Profiling Linux Targets	10	389	December 8, 2024

Weird memory write bytes reported by nv-nsight-cu-cli

Related topics