Question about dram_read_transactions

devoch1217 · September 22, 2024, 3:45am

Hi,

I am a newbie to cuda programming. I am trying to do very basic profiling to verify my understanding of device memory read. To my understanding, memory is read in 128bytes chunk in the most cases and therefore with the increase of stride, the number of transactions should increase. However, with the following code, sometimes I see that with bigger stride the number of transactions is smaller than the number of transactions with smaller stride. Below are the codes and the command I used to run nvprof.

__global__
void read(float *x, float *y, int stride, int num_reads) {
    int index = 0;
    int lim = num_reads * stride + index;
    for (int i = index; i < lim; i += stride){
        y[0] += x[i];
    }
}
int main(int argc, char **argv)
{
    if (argc != 3) {
        std::cerr << "Usage: " << argv[0] << " <stride>" << std::endl;
        return 1;
    }
    std::cout << "stride = " << atoi(argv[1]) << std::endl;
    int stride = atoi(argv[1]);
    int num_reads = atoi(argv[2]);
    int N = 1<<20;
    float *x, *y;
    cudaMallocManaged(&x, N*sizeof(float));
    cudaMallocManaged(&y, sizeof(float));
    for (int i = 0; i < N; i += 1){
        x[i] = 1.0f;
    }
    y[0] = 0.0f;
    read<<<1, 1>>>(x, y, stride, num_reads);
    cudaDeviceSynchronize();
    std::cout << "Result = " << y[0] << std::endl;
    cudaFree(x);
    cudaFree(y);
    return 0;
}

Profiling command: nvprof --metrics dram_read_transactions ./read_test “stride” “num_reads”

Results:
==1726484== Profiling application: ./read_test 1 1000
==1726484== Profiling result:
==1726484== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device “Tesla V100-SXM2-16GB (0)”
Kernel: read(float*, float*, int, int)
1 dram_read_transactions Device Memory Read Transactions 75 75 75

==1726509== Profiling application: ./read_test 512 1000
==1726509== Profiling result:
==1726509== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device “Tesla V100-SXM2-16GB (0)”
Kernel: read(float*, float*, int, int)
1 dram_read_transactions Device Memory Read Transactions 60 60 60

I greatly appreciate you help on this matter!

Robert_Crovella · September 23, 2024, 2:12am

when posting code on these forums, please format it.

A simple method would be to edit your post by clicking the pencil icon below it, select the code, then click the </> button at the top of the edit pane, then save your changes.

Please do that now.

devoch1217 · September 23, 2024, 2:19am

Thank you for your suggestion. I have edited my post.

Curefab · September 23, 2024, 1:05pm

The compiler could either choose to keep y[0] in registers or reread it for every iteration of the loop. Also with the single thread running <<<1,1>>> it could happen that you see memory accesses not related to the direct instructions of your kernel. Sometimes a handful of additional operations are done, probably for initialization purposes.

Topic		Replies	Views
Understanding Profiling Metrics CUDA Programming and Performance	0	417	January 16, 2019
block-strided access problem CUDA Programming and Performance	1	903	September 13, 2013
Trying to understand Transactions per request for P100 CUDA Programming and Performance	2	1522	February 26, 2018
Global load transaction count when in coalesced memory access Visual Profiler and nvprof	3	2268	July 7, 2017
Reproducing strided memory access benchmark CUDA Programming and Performance	4	138	October 25, 2024
Warp or thread level stats for memory metrics CUDA Programming and Performance	1	413	March 24, 2020
Device memory in nvidia visual profiler CUDA Programming and Performance	1	775	October 10, 2015
Profiling simple shared memory transactions CUDA Programming and Performance	2	1701	September 6, 2015
"nvprof -m dram_read_bytes" has strange error? Visual Profiler and nvprof	1	1136	July 17, 2019
Kernel modification for math/memonly and profiler results Understanding values of dram_reads and gld CUDA Programming and Performance	6	1850	April 20, 2011

Question about dram_read_transactions

Related topics