Hi, I have 2 machines that are exactly the same hardware-wise.
On them, I have 1 Debian, and 1 custom build with buildroot linux image.
The problem I’m experiencing is that:
For my application, I’m getting roughly 2x performance on the Debian machine in the relation to custom build image machine.
I’m trying to understand what could cause this difference.
The cuda version is 7.5
The driver versions are:
Some information that may be relevant:
If I run deviceQuery (from cuda samples) on both machines, the results are almost but not exactly the same:
1st difference is in line:
This is for Debian:
Total amount of global memory: 3069 MBytes
And this is for Custom:
Total amount of global memory: 3008 MBytes
(Though I doubt that this difference can cause the mentioned difference in performance.)
The other difference is:
Run time limit on kernels: Yes
Run time limit on kernels: No
If I run the bandwidthTest from samples, the results are more/less the same for Host to Device Bandwidth, 1 Device(s) and Device to Host Bandwidth, 1 Device(s), but can differ significantly for
Device to Device Bandwidth, 1 Device(s)
So for the last entry values on Debian is in the area of 100k MB/s (though sometimes it drops down to around 63k)
For Custom it’s consistently on the level of 63k.
I would be glad if you could help me with advice on what should I investigate further and what could be the problem.