Hi there,
I am trying to benchmark 2 GPU’s available in my team for LSTM training purposes, NVIDIA Quadro P2000 & Quadro P6000.
The first benchmark results do not give a so big advantage to P6000 compared to P2000 in terms of training speed (~ 8%) as I would expect, given the highest memory for P6000 : is this something that is known and accepted as speed performance for a P6000 when used for RNN training or am I rather missing something on GPU’s setup ?
I am working with :
- Environment : Matlab 2019a
- CUDA : cuda_11.0.2_451.48_win10
- cuDNN libraries : cudnn-11.0-windows-x64-v8.0.1.13
Information 1st GPU
Name: ‘Quadro P2000’
Index: 1
ComputeCapability: ‘6.1’
SupportsDouble: 1
DriverVersion: 10.1000
ToolkitVersion: 10
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.3687e+09
AvailableMemory: 4.1733e+09
MultiprocessorCount: 8
ClockRateKHz: 1480500
ComputeMode: ‘Default’
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Information 2nd GPU
Name: ‘Quadro P6000’
Index: 1
ComputeCapability: ‘6.1’
SupportsDouble: 1
DriverVersion: 11
ToolkitVersion: 10
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.5770e+10
AvailableMemory: 2.1349e+10
MultiprocessorCount: 30
ClockRateKHz: 1645000
ComputeMode: ‘Default’
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Kindly help please.