TITAN X Pascal Performance

Hi all,

We recently obtained the legendary NVIDIA TITAN X (Pascal) at the lab and I’m just wondering about its performance.

We will mainly use it for CNNs with images.
More specifically, we already have a trained network and would like to produce output images of size 1920*1080 through this network at a rate of 20fps~ (real-time). I’m currently using the cuDNN library for the network.
I have some questions about its potential performance:

  1. In what ways is the TITAN X (Pascal) better than the previous GPUs such as GTX 1080?

  2. Is convolution further optimized in TITAN X? (compared to GTX 1080)
    I heard some stories that the Titan X has enhanced features for deep learning applications. Does anyone know how it’s better?

  3. Since I’m using cuDNN, do the functions in this library take into account the fact that I am now using the Titan X (Pascal)? For example, when using the cudnnGetConvolutionForwardAlgorithm function for each convolutional layer, I can see that the function chooses the same algorithms as when using GTX 1080. But I thought this function was machine dependent?

  4. We have four Titan X’s on a single computer. Any comments about multi-GPU computing?

Side question: Can cuDNN be used commercially? To be more specific, am I allowed to code an algorithm that’s to go on a commercial product using functions in the cuDNN library? On the NVIDIA Parallel Forall Blog, it says that cuDNN can be used for any purpose including commercial purposes, and there’s nothing special regarding this in the license document. However, I just want to be sure.

That was a lot of questions… I would be very grateful for any comments regarding the above matter!
(I’m just curious about the TITAN X in general)

Thank you.

The GTX 1080 and the Titan XP are the same architecture, CC 6.1. It would be quite unusual for machine-dependent code to make finer distinctions than by architecture version. To first order, you can treat the Titan XP as a faster version of the GTX 1080, with more cores, a larger memory, and higher memory throughput.

I wouldn’t use CUDNN directly, but rather a performant high-level DNN package (using CUDNN internally) which provides automatically good Multi GPU support for training and inference.
“MXNet” package looks good with regard to this, see
http://mxnet.io/ and https://github.com/dmlc/mxnet/blob/master/docs/how_to/multi_devices.md

Thank you!