After I reinstalled my ubuntu on PC, I installed all libs and drivers and then run caffe time with the same model, to test the forward pass time.
Originally ,before I resintall my OS, the forward time is 6 ms , but now it becomes 8 ms at the first time after I reinstall the driver , and after the first run, the forward time becomes 17 ms , then it stays 17ms . It’s much slower.
How is that ?
My os is Ubuntu 16.04, GPU is 1080Ti, Driver is NVIDIA-Linux-x86_64-410.73.run.
cuda :cuda_9.0.176_384.81_linux.run.
cudnn:libcudnn7-dev_7.3.1.20-1+cuda9.0_amd64.deb.
The caffe config file is totally the same with the old version on which the forward time is 6 ms.
Definitely try cbuchner’s advice, although a slowdown by a factor of 2x at application level seems unlikely to be triggered by a driver update.
(1) Confirm that no changes were made to the hardware, including things like moving cards between PCIe slots, BIOS settings, etc.
(2) Double-check software configuration settings, unless you are using a pilot-style detailed checklist, it is extremely easy to overlook some detail; I have been there, done that, and spent most of a day tracking down the cause of my troubles.
(3) How carefully did you record the previous performance numbers? I have done a lot of benchmarking in my life, and with reams of raw data recorded in engineering notebooks, it is easy to become confused: mix up measurements for different parts of a software, or record the wrong software version. In some such cases, when I repeated my experiments to explore discrepancies, I could not longer reproduce what I had recorded earlier.
well , I went back to old versions of Ubuntu、drive 、cuda and cudnn, and it seems ok on one GPU. the forward time is 6ms.
on the other 1080Ti , the caffe time is still 12 ms, so strange .
So if I understand correctly, you have two GTX 1080 Ti GPUs in your system, and one is “fast” and the other is “slow”? Do they have identical specifications (same brand, same SKU)?
If you physically swap the GPUs in their PCIe slots, does the “slowness” follow the GPU or does it stay with a particular slot?
If the slowness correlates with the GPU, double-check cooling and power supply. Is the same VBIOS installed in both GPUs? When you monitor with nvidia-smi, do you see any significant differences between the GPUs (load factor, bus utilization, temperature, clocks)?
If the slowness correlates with the PCIe slot, double-check PCIe slot configuration. One slot may be a full x16 while other is only a x4, although they both may use a mechanical x16 form factor.