Benchmarking Quadro P2000 Vs P6000 for LSTM training

Andres_V · July 8, 2020, 12:42pm

Hi there,

I am trying to benchmark 2 GPU’s available in my team for LSTM training purposes, NVIDIA Quadro P2000 & Quadro P6000.

The first benchmark results do not give a so big advantage to P6000 compared to P2000 in terms of training speed (~ 8%) as I would expect, given the highest memory for P6000 : is this something that is known and accepted as speed performance for a P6000 when used for RNN training or am I rather missing something on GPU’s setup ?

I am working with :

Environment : Matlab 2019a
CUDA : cuda_11.0.2_451.48_win10
cuDNN libraries : cudnn-11.0-windows-x64-v8.0.1.13

Information 1st GPU
Name: ‘Quadro P2000’
Index: 1
ComputeCapability: ‘6.1’
SupportsDouble: 1
DriverVersion: 10.1000
ToolkitVersion: 10
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.3687e+09
AvailableMemory: 4.1733e+09
MultiprocessorCount: 8
ClockRateKHz: 1480500
ComputeMode: ‘Default’
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

Information 2nd GPU
Name: ‘Quadro P6000’
Index: 1
ComputeCapability: ‘6.1’
SupportsDouble: 1
DriverVersion: 11
ToolkitVersion: 10
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.5770e+10
AvailableMemory: 2.1349e+10
MultiprocessorCount: 30
ClockRateKHz: 1645000
ComputeMode: ‘Default’
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

Kindly help please.

Topic		Replies	Views
DeepLearning Performance Quadro P6000 CUDA Programming and Performance	0	1341	January 12, 2017
Newbie Question (Quadro 600 vs GTX 460 1GB ) CUDA Programming and Performance	0	25479	January 30, 2011
Need help to choose either the gtx 295 or the gtx 480 for massive Lattice Boltzman simulations CUDA Programming and Performance	10	1340	December 9, 2010
Disappointed performance using C2050 CUDA Programming and Performance	20	7781	September 2, 2010
Comparing C1060, GTX470, GTX480 and C2050 Benchmark results of the Fermi Cards and Tesla generation CUDA Programming and Performance	9	25928	November 4, 2010
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29253	December 8, 2010
50x slowdown on one machine vs. another CUDA Programming and Performance	3	1592	October 4, 2009
Achive Performance in K620 GPU CUDA Programming and Performance	7	4237	December 30, 2014
Understanding Natural Language with Deep Neural Networks Using Torch Technical Blog	18	459	September 26, 2016
Why Multi-GPU slower than single GPUï¼Ÿ CUDA Programming and Performance	2	7617	September 14, 2011

Benchmarking Quadro P2000 Vs P6000 for LSTM training

Related topics