When I was evaluating the Jetson series modules, I found that the test results of the GPU were inconsistent with expectations.
During the test, I burned the official Linux system and adjusted the power of the Jetson module to the maximum before testing the GPU.
I got the following test results:
Would you mind sharing more info about the expectations?
More, here are some suggestion for the benchmark sample:.
1. Please add some warmup loop to test.c so no initiailization latency is included. 2. Please move the synchronize outside of the loop to allow parallelism.
For example:
void TestComputation(double* data, int grid_size, int block_size) {
for (int i = 0; i < 1000000; i++) {
TestComputation_cu<<<grid_size, block_size>>>(data);
}
cudaDeviceSynchronize();
}
I hope that through test cases, I can get the accurate gap between GPUs between different platforms.
I found that my test results are not consistent with the test results at Jetson Benchmarks | NVIDIA Developer on the official website
The Jetson benchmark is tested with DNN inference, which usually contains complicated computational work.
But in your test source, it is mainly bottlenecked by the memory IO operations.
If you want to see similar results as the Jetson benchmark.
You can try it directly since the source code is public available: