Here are clpeak results:
Platform: Portable Computing Language
Device: Xavier
Driver version : 1.6 (Linux ARM64)
Compute units : 8
Clock frequency : 1377 MHz
Global memory bandwidth (GBPS)
float : 84.52
float2 : 107.46
float4 : 106.80
float8 : 107.15
float16 : 105.47
Single-precision compute (GFLOPS)
float : 1355.57
float2 : 1403.25
float4 : 1398.78
float8 : 1394.55
float16 : 1384.85
No half precision support! Skipped
Double-precision compute (GFLOPS)
double : 44.03
double2 : 43.96
double4 : 43.85
double8 : 43.57
double16 : 43.16
Integer compute (GIOPS)
int : 1367.98
int2 : 1400.67
int4 : 1391.98
int8 : 1399.31
int16 : 1398.18
Integer compute Fast 24bit (GIOPS)
int : 1367.96
int2 : 1400.73
int4 : 1392.01
int8 : 1399.45
int16 : 1398.25
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 8.07
enqueueReadBuffer : 8.22
enqueueWriteBuffer non-blocking : 8.29
enqueueReadBuffer non-blocking : 8.28
enqueueMapBuffer(for read) : 23585.76
memcpy from mapped ptr : 8.39
enqueueUnmap(after write) : 13.49
memcpy to mapped ptr : 8.38
Kernel launch latency : -30.71 us