Jetson module evaluation

When I was evaluating the Jetson series modules, I found that the test results of the GPU were inconsistent with expectations.
During the test, I burned the official Linux system and adjusted the power of the Jetson module to the maximum before testing the GPU.
I got the following test results:

The test code is as follows:
test.zip (1006 Bytes)

Hi,

Would you mind sharing more info about the expectations?
More, here are some suggestion for the benchmark sample:.

1. Please add some warmup loop to test.c so no initiailization latency is included.
2. Please move the synchronize outside of the loop to allow parallelism.
For example:

void TestComputation(double* data, int grid_size, int block_size) {
  for (int i = 0; i < 1000000; i++) {
    TestComputation_cu<<<grid_size, block_size>>>(data);
  }
  cudaDeviceSynchronize();
}

Thanks.

I hope that through test cases, I can get the accurate gap between GPUs between different platforms.
I found that my test results are not consistent with the test results at Jetson Benchmarks | NVIDIA Developer on the official website

Hi,

The Jetson benchmark is tested with DNN inference, which usually contains complicated computational work.
But in your test source, it is mainly bottlenecked by the memory IO operations.

If you want to see similar results as the Jetson benchmark.
You can try it directly since the source code is public available:

Thanks.

Hello,

What method should I use as a standard to evaluate GPU computing power under different platforms?

Thanks.

Hi,

In general, you can find the CPU/GPU/Total power with nvpmodel or tegrastats.
However, DLA power consumption is not able to be measured.

Thanks.