How do I know if my GPU will support a convolutional neural network?

Good morning everyone, I have a question about the performance of a CPU or GPU. I want to test a vision algorithm with a random neural network. What parameters should I analyze to determine that my algorithm and my neural network can run without problems in a computer with CPU or GPU?
With respect to the computer with GPU I have seen that Nvidia uses a lot the TOPS (Trillions or Tera Operations per Second), how is that metric calculated? what should I analyze with respect to the GPU, the number of cores, transistors?

I too would be very interested in better understanding the factors that are considered for choosing CPU/GPU/ASIC/etc hardware solutions for neural network inference. Thanks!

A GPU is a programmable processor. That means it has a lot of flexibility to solve a lot of different kinds of computational tasks. Kind of like making a similar qualitative statement about a CPU.

It’s evident that GPUs are well suited to either neural network training or inference. Simply survey the state of the art in either of these disciplines.

Considering just a single GPU work, the biggest single factor for capability is probably memory size. The size of your neural network (the number of weights and biases, i.e. “parameters”) as well as the size of your data batches, when compared to your GPU memory size, allowing for overhead, will be the most proximal indicator what your GPU is “capable” of. Indeed, you can find many posts of people who are out of memory on their GPU when trying to run various NN codes. A common piece of advice is to either reduce batch size, or get a GPU with more memory. Very large models such as large language models with hundreds of billions of parameters may not fit on a single GPU, and may require specialized methods to distribute work across multiple GPUs.

I won’t be able to give you detailed recipes, calculations, spreadsheets, or calculators, to go from abstract discussion of model parameters and data batch sizes, to GPU memory consumption. The current methodology here is strongly biased towards trial and error. But nevertheless some crude statements can be made, such as the one above about models with many billions or trillions of parameters. Likewise, if your smallest dataset batch size is 8GB, it’s unlikely to be workable on a 4GB GPU.

Performance is a separate issue. More powerful GPUs will generally be more performant.

Mainstream neural network calculations are dominated by the matrix-matrix multiply operation. NVIDIA developed the tensor core (TC units) in large part to assist with this. When doing neural network calculations, on a NVIDIA GPU, use of tensorcores should be considered the “fast path”. You can find detailed tensor core calculations here. However, this should be considered a rough guide as to what to expect, performance-wise. If your code is making use of tensorcores (i.e. it is doing the layer-wise matrix-matrix multiply operations using a suitable type like FP16), then a GPU with more tensorcore throughput will likely run that code faster.

Regarding TOPs vs. TFLOPs, when a TC unit is computing using floating-point arithmetic, the throughput is generally indicated in TFLOPs/s. When a TC unit is computing using integer arithmetic, the throughput is generally indicated in TOPs.

1 Like

Thank you very much for the information, it has been very helpful and very well explained.

I have one last question regarding the batch size. I use this parameter in the training, when the size is increased the training is slower but it will be more accurate in the detections. My question is based if this parameter also affects the final result; when implementing the model in some application, it will perform slow detections when increasing the batch size?

This particular forum is not really focused on the mechanics of neural network training. You can find various forums that are focused, for example pytorch or tensorflow.

Increasing batch size normally increases GPU efficiency which should result in a reduction in training time, all other things being equal. So I can’t explain your observation that increasing batch size takes longer, although it may be due to the effects of the parameter server.

Yes, changing nearly anything hyperparameter-wise can affect the final result, including batch size. Ideally, changes in batch size would only make a small change in the result, however when batch sizes get very large, then it can have a noticeable deleterious effect on training convergence and final result. This is really not the forum to dive into this deeply, and I probably won’t respond to further questions in this vein.

1 Like

Thank you very much for the answer, it has been very useful for me.