# Matrix Multiplication is slow on Denver

Hi, I am experimenting with TX2 Denver vs A57 performance with a matrix multiplication code.
I see that when Size is > 64 Denver is slower than A57. Also, there is a difference in performance (i.e Denver is slow) when I use heap vs stack.
Code:

``````#include <iostream>
#include <chrono>
#include <omp.h>
#define HEAP 0
using namespace std;
void warmup2(int SIZE){
#if HEAP
double** a = new double*[SIZE];
double** b = new double*[SIZE];
double** c = new double*[SIZE];
for(int i = 0; i < SIZE; ++i) {
a[i] = new double[SIZE];
b[i] = new double[SIZE];
c[i] = new double[SIZE];
}
#else
double a[SIZE][SIZE];
double b[SIZE][SIZE];
double c[SIZE][SIZE];
#endif
int	tid, nthreads, i, j, k, chunk;
/*** Initialize matrices ***/
for (i=0; i<SIZE; i++)
for (j=0; j<SIZE; j++)
a[i][j]= i+j;
for (i=0; i<SIZE; i++)
for (j=0; j<SIZE; j++)
b[i][j]= i*j;
for (i=0; i<SIZE; i++)
for (j=0; j<SIZE; j++)
c[i][j]= 0.0;
double t1 = omp_get_wtime();
for (int s=0; s<1000; s++){
for (i=0; i<SIZE; i++)
{
for(j=0; j<SIZE; j++)
for (k=0; k<SIZE; k++)
c[i][j] += a[i][k] * b[k][j];
}

}
double t2 = omp_get_wtime() - t1;
cout<<t2<<endl;
}
int main(int argc, char** argv) {
int size = 64;
if(argc > 1) size = atoi(argv[1]);
warmup2(size);
return 0;
}
``````

run results:

## taskset -c 0 ./dgemm 128. [A57] 9.22673 taskset -c 1 ./dgemm 128. [Denver] 20.4897 Matrix size 64

taskset -c 0 ./dgemm 64 [A57]
0.792225
taskset -c 1 ./dgemm 64 [Denver]
0.618031

System config:

SOC family:tegra186 Machine:quill

Online CPUs: 0-5

CPU Cluster Switching: Disabled

cpu0: Gonvernor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200

cpu1: Gonvernor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200

cpu2: Gonvernor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200

cpu3: Gonvernor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200

cpu4: Gonvernor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200

cpu5: Gonvernor=performance MinFreq=345600 MaxFreq=2035200 CurrentFreq=2035200

GPU MinFreq=114750000 MaxFreq=1300500000 CurrentFreq=114750000

EMC MinFreq=40800000 MaxFreq=1866000000 CurrentFreq=1866000000 FreqOverride=1

Fan: speed=0

Hi,
We have observed the same on TX2. Please refer to
https://elinux.org/Jetson/L4T/r32.4.x_patches
[TX2] Denver cores not working on TX2