Low bandwidth of memory copy among CPU

Freffy · November 17, 2016, 1:34pm

Hello everyone,

Unfortunately there’s another issue while we’re developing our TX1 based system. That is, the practical bandwidth of CPU memory copy is only about 3.5GB, while the bandwidth can be up to 12.8GB in theory since the DDR frequency is 1.6GHz and it’s 64 bit architecture. Though there may be some efficiency loss, bandwidth of 3.5GB can be a little of low.

The testing code is as follows

#define MEM_SIZE (10*1024*1024)

int main (void)
{
    void *src, *dest;
    struct timeval tv1, tv2;

    src = malloc(MEM_SIZE);
    if (!src) {
        printf("failed to allocate memory src\n");
        return -1;
    }
    memset(src, 0, MEM_SIZE);

    dest = malloc(MEM_SIZE);
    if (!dest) {
        printf("failed to allocate memory dest\n");
        return -1;
    }
    memset(dest, 0, MEM_SIZE);

    while (1) {
        gettimeofday(&tv1, NULL);
        memcpy(dest, src, MEM_SIZE);
        gettimeofday(&tv2, NULL);
        printf("memcopy once %d us\n", (tv2.tv_sec-tv1.tv_sec)*1000000+(tv2.tv_usec-tv1.tv_usec));
        //usleep(30000);
    }

    return 0;
}

The testing code finally shows that the memory copy of 10MB data among CPU costs approximately 2.8ms, thus bandwidth of 3.5GB

Obviously there are some differences between the practical performance and the ideal performance, or is there any problem with my testing?

I’d appreciate it if anyone would reply ^^

kayccc · November 25, 2016, 6:32am

Hi Freffy,

Please use the script of maximizing performance from below to see if any different:
[url]http://elinux.org/Jetson/TX1_Controlling_Performance[/url]

Thanks

Topic		Replies	Views
upper limit for memory bandwidth on the device ? CUDA Programming and Performance	13	11312	July 8, 2009
device to device bandwidth confusion? CUDA Programming and Performance	4	2308	February 26, 2009
my speedy Memcpy() CUDA Programming and Performance	9	14955	January 5, 2009
Jetson TX2 memory throughput Jetson TX2	3	806	October 18, 2021
Jetson TK1: device to device memory copy performance Jetson TK1	4	2010	May 29, 2015
How to achieve highest possible global mem bandwidth? CUDA Programming and Performance	11	7658	January 5, 2009
Theoretical ON-CHIP Bandwidth how to determine? CUDA Programming and Performance	15	11643	October 16, 2009
Bandwidth measurement Theortical bandwidth vs BandwidthTest(SDK) results CUDA Programming and Performance	4	1584	May 30, 2011
Lower then expected bandwidth on C2050 CUDA Programming and Performance	11	9109	October 26, 2010
Memory bandwidth CUDA Programming and Performance	31	38535	October 5, 2007

Low bandwidth of memory copy among CPU

Related topics