eGPU box with thunder bolt 3.0 performance

Hi all,

I have 3 same gpu cards, 2 of them are directly installed on the PCIe plots of the motherboard (ASUS prime X299-Deluxe) and, due to the space limits of the PC case, the 3rd one is installed in an external eGPU box (ASUS xg station pro) connected with thunder bolt 3.0 port.

My OS is Ubuntu 18.04.2, Nvidia driver version is 440.44 and CUDA version is 10.0. All the cards are recognized successfully by the OS.

I tested the computational speed of these 3 cards by my deep-learning program (tensorflow frame work, gpu version 1.14) and found that the eGPU one is about two times slower than the other two. Besides, I also run tests using VASP-gpu version as

mpirun -np 8 vasp_gpu

Then all the three cards were used while the system quickly went dead in a short time.

If I unplug the eGPU box and only use the 2 gpu cards installed on the motherboard, the VASP software operates fine.

Has anyone tested the compuational speed of eGPU box with thunderbolt 3.0 port? And is this because of the speed difference of the 3 cards that the VASP software leads to the shutdown of the OS?

Thanks for your help.

Bo-Yuan

An important aspect with eGPU is the number of PCIe lines available to the Thunderbolt controller. Do you know if it’s single, double, or quad?

For starters, you might want to try bandwidthTest in the NVIDIA Samples to check if there is a major difference in the eGPU box connection. Then you might want to try a few of the heavy compute (single GPU) examples to get a baseline. If they’re the same card and there are no memory transfers, they should performance the same.

I’ve never used VASP, so I can’t speak to that issue.

Hi mnicely,

Thank you very much for the reply. I checked the instruction of the Thunderbolt card and its PCIe3.0 lane is X4. The other two GPUs directly installed on the motherboard has PCIe3.0 lane X16. I am not sure, but I guess maybe it is the bandwith of PCIe lane difference leading to the performance difference. Anyway, I will use Nvidia Samples to run the bandwidth tests for each GPU. Again, thanks a lot.