I have a Titan X Pascal, Intel i5-6600, 16GB Ram and running torch7 in Ubuntu 14.04. The Nvidia driver version is 375.20, CUDA Toolkit 8.0 and cuDNN v5.1. I want to use this machine for deep learning on computer vision problems.
I did the same test with the same VGG16 network from Caffe (imported via loadcaffe) as in this benchmark: https://github.com/jcjohnson/cnn-benchmarks. However, for a forward pass my setup needs 80ms which is double the time as it apparently needs in the benchmark (~40ms).
I also generated a batch of 16 images with 3 channels and size 224x224. The relevant code is:
local model = loadcaffe.load("/home/.../Models/VGG16/VGG_ILSVRC_16_layers_deploy.prototxt", "/home/.../Models/VGG16/VGG_ILSVRC_16_layers.caffemodel", "cudnn") for i=1, 50 do local input = torch.randn(16, 3, 224, 224):type("torch.CudaTensor") cutorch.synchronize() local timer = torch.Timer() model:forward(input) cutorch.synchronize() local deltaT = timer:time().real print("Forward time: " .. deltaT) end
The output is:
Forward time: 0.96536016464233 Forward time: 0.10063600540161 Forward time: 0.096444129943848 Forward time: 0.089151859283447 Forward time: 0.082037925720215 Forward time: 0.082045078277588 Forward time: 0.079913139343262 Forward time: 0.080273866653442 Forward time: 0.080694913864136 Forward time: 0.082727193832397 Forward time: 0.082070827484131 Forward time: 0.079407930374146 Forward time: 0.080456018447876 Forward time: 0.083559989929199 Forward time: 0.082060098648071 Forward time: 0.081624984741211 Forward time: 0.080413103103638 Forward time: 0.083755016326904 Forward time: 0.083209037780762 ...
Do I have to do anything additional to get the speed like in the benchmark? Or am I doing something wrong here? Or is it maybe because I am using Ubuntu 14.04, instead of Ubuntu 16.04 (although in the benchmark a GTX 1080 running on Ubuntu 14.04 also only needs 60ms)?