[Solved] Poor performance on DGX-1 than Titan X


I implement an iterative algorithm using CUDA 8.0 and three Titan X cards on Ubuntu 16.04. I got a good performance (100 seconds comparing with 300 seconds on 40 cores/80 threads CPU).

After tests, I deploy the app onto a DGX-1 GPU server (nvidia-docker 2, one Tesla V100 card with drive version: 410, cuda version: 10.0) by a GPU container with CUDA 8.0 (no drive). Unfortunately, the performance is much worse than my original Titan version (500 seconds).

Would you please to give me some hints on what’s wrong with my experiment?

Thank you.
Best wishes,
Jia Sen

This reads like you deployed a CUDA 8.0 binary built for Pascal architecture on a machine with Volta architecture?

You may want to build targeting the Volta architecture using the CUDA 10 compiler instead before judging performance.

The next step would be profiling to find out where the code spends most of its time. Both on Titan X as well as the DGX-1


Hi Christian,

Thank you for your helpful suggestion. After I replaced CUDA 8.0 in my docker image by CUDA 10, I got a good performance on my DGX-1.

Best wishes,
Jia Sen