When we are trying to transplant our CUDA source code from TX1 to TX2, it behaved strange.
We noticed that TX2 have twice computing-ability as TX1 in GPU, as expectation, we think TX2 will 30% - 40% faster than TX1 at least.
Unfortunately, Most our code base spent twice time as TX1, in an other words, TX2 only have 1/2 speed as TX1, mostly. After we logged all small step which invoked CUDA APIs. We believe that TX2’s CUDA API do computed slow than TX1 in many cases.
Here’s a third party public repo which could reproduce my statement:
test on both tx1 and tx2 using example image:
tx1: about 57ms per frame (17fps).
tx2: about 258ms per frame (3 fps).
about four times slower.
TX1 ubuntu 14.04 cuda 7.0.74 normal usage, no more power settings
TX2 ubuntu 16.04 cuda 8.0.62 nvpmodel -m 0
Any suggestion about how to improve the TX2’s GPU performance?