Cuda vs C/C++ auto-vectorized

FiReTITi · August 21, 2014, 7:41pm

Hi All,

I have to develop a Deep Neural Network. I can make it run on a big cluster with hundreds of CPUs or on a GPU.
I would like to know what is faster:

develop a Cuda version (I don’t ming by the time required to exchange memory with the GPU)
develop a C/C++ auto-vectorized version that I can run on multiple CPUs.

Would someone have an idea?

I do have a second question: as the different operations are simple (mainly arithmetic operations between arrays), the soft will not saturate the GPU. So is it possible using a single GPU to run in parallel multiple simple processes?

Thank you for your help.

Gregory_Diamos · August 21, 2014, 8:58pm

A good implementation of a large neural network (as long as it isn’t too sparsely connected) should be FLOP limited, so a GPU would probably have an advantage (close to the ratio of peak FLOPs).

Note that a simple implementation of a Neural Network would involve simple array updates, but as you mention, these would not be compute intensive enough to saturate a GPU (or a CPU for that matter). A better implementation would merge multiple array operations together into more heavy-weight operations, but this will also mean that you can’t use off-the-shelf libraries.

What people commonly do is to build networks with locally-connected layers, and implement the forward propagation with block-sparse (i.e. batched) SGEMM operations. There will still be some additional operations left over, and you would need to fuse these into the SGEMM kernels (effectively writing your own custom SGEMM) to really get close to peak.

Topic		Replies	Views
Any neural network implementations on Cuda available so far? Looking for neural network implementati CUDA Programming and Performance	2	5263	May 21, 2011
Is to possible to speed up multiple matrix per vector multiplication using CUDA? CUDA Programming and Performance	2	1473	April 12, 2010
Some questions about CUDA and my problem CUDA Programming and Performance	3	2419	June 18, 2008
Genetic / Neural Implementation I'm new to CUDA and I'm totally stuck CUDA Programming and Performance	6	5017	July 9, 2009
GA & NN & CUDA CUDA Programming and Performance	3	5742	April 12, 2010
When to use Serial CPU, CUDA, OpenMP and MPI? CUDA Programming and Performance	8	14086	May 29, 2021
neural network processing on gpu CUDA Programming and Performance	2	2703	April 7, 2008
Neural Network Programming neural network programming/groups in SF CUDA Programming and Performance	12	5842	October 11, 2017
Program without CUDA is faster CUDA Programming and Performance	6	10575	December 19, 2008
300x to 600x times faster... really? CUDA Programming and Performance	92	35466	February 8, 2010

Cuda vs C/C++ auto-vectorized

Related topics