GPU Memory on K80 vs V100

chandra · August 11, 2020, 1:06pm

Hi,

We have build an image classification model with pre-trained ResNet50. The versions for pytorch are m torch==1.4.0 and torchvision==0.5.0.

The model was deployed initially on a K80 based VM in Azure.Later the application and the model was moved to a Tesla V100 based GPU on AWS . When observed the process is taking more memory than what it was taking inside the K80 VM. (Details pasted below)

Since we are deploying other models as well in this single GPU machine , we are started getting out of memory errors. But the same containers are running on K80 instance without any issue.

Please let me know , what is causing the spiking the GPU memory and how to resolve this.
Thank you

(Since i’m a new user can’t upload screenshots, pasted nvidia-smi ouput)

K80, below is the memory usage.

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2825 C /usr/bin/python3 504MiB |
| 0 2826 C /usr/bin/python3 504MiB |
| 0 2827 C /usr/bin/python3 504MiB |
| 0 2828 C /usr/bin/python3 504MiB |
| 0 2829 C /usr/bin/python3 504MiB |
±----------------------------------------------------------------------------+

On a V100

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3263 C /usr/bin/python3 1273MiB |
| 0 3264 C /usr/bin/python3 1273MiB|
| 0 3265 C /usr/bin/python3 1273MiB|
| 0 3266 C /usr/bin/python3 1273MiB|
| 0 3268 C /usr/bin/python3 1273MiB|
±----------------------------------------------------------------------------+

Topic		Replies	Views
k80 Vs P100 Linux	0	3083	November 9, 2017
GPU memory cannot be released Deep Learning (Training & Inference)	0	1342	October 26, 2018
Tesla K80 poor performance on Azure CUDA Setup and Installation	1	814	June 5, 2019
Different memory consumption on Nvidia A100 and Nvidia T4 Video Processing & Optical Flow	1	757	May 30, 2023
K20 with high utilization, but no compute processes. CUDA Setup and Installation	12	26693	March 19, 2015
Interpreting nvidia-smi output CUDA Setup and Installation	5	5058	May 19, 2021
GPU Performance CUDA Programming and Performance	12	13502	March 5, 2019
vSphere 6.7, Linux Guest, V100 vGPU and memory used problem General Discussion	2	2406	July 4, 2019
Deploying Pytorch on MPS in multi-GPU machines CUDA Programming and Performance	0	1170	March 13, 2020
GPU memory allocated but GPU usage 0% CUDA Programming and Performance	2	8029	January 5, 2021

GPU Memory on K80 vs V100

Related topics