Low bandwidth of memory copy among CPU

Hello everyone,

Unfortunately there’s another issue while we’re developing our TX1 based system. That is, the practical bandwidth of CPU memory copy is only about 3.5GB, while the bandwidth can be up to 12.8GB in theory since the DDR frequency is 1.6GHz and it’s 64 bit architecture. Though there may be some efficiency loss, bandwidth of 3.5GB can be a little of low.

The testing code is as follows

#define MEM_SIZE (10*1024*1024)

int main (void)
    void *src, *dest;
    struct timeval tv1, tv2;

    src = malloc(MEM_SIZE);
    if (!src) {
        printf("failed to allocate memory src\n");
        return -1;
    memset(src, 0, MEM_SIZE);

    dest = malloc(MEM_SIZE);
    if (!dest) {
        printf("failed to allocate memory dest\n");
        return -1;
    memset(dest, 0, MEM_SIZE);

    while (1) {
        gettimeofday(&tv1, NULL);
        memcpy(dest, src, MEM_SIZE);
        gettimeofday(&tv2, NULL);
        printf("memcopy once %d us\n", (tv2.tv_sec-tv1.tv_sec)*1000000+(tv2.tv_usec-tv1.tv_usec));

    return 0;

The testing code finally shows that the memory copy of 10MB data among CPU costs approximately 2.8ms, thus bandwidth of 3.5GB

Obviously there are some differences between the practical performance and the ideal performance, or is there any problem with my testing?

I’d appreciate it if anyone would reply ^^

Hi Freffy,

Please use the script of maximizing performance from below to see if any different: