Hi all,
I’m encountering a significant performance discrepancy between our server with three NVIDIA RTX A6000 GPUs and a PC with a single NVIDIA GeForce RTX 3090 GPU. Despite understanding that the benchmarks should be relatively close (test on every GPU independently), the GPU RTX A6000 underperform compared to GPU RTX 3090 greatly. Below are the details of the tests and results we’ve obtained.
Benchmark Test Details
We ran a Python benchmark on both systems, performing matrix multiplication on two large random matrices. The script is as follows:
import torch
import time
def benchmark(device):
if device != 'cpu':
torch.cuda.set_device(device)
device_name = torch.cuda.get_device_name(device)
else:
device_name = 'CPU'
print(f"Running benchmark on {device_name}")
size = 10000
iterations = 10
tensor_a = torch.randn(size, size, device=device)
tensor_b = torch.randn(size, size, device=device)
for _ in range(10):
_ = torch.mm(tensor_a, tensor_b)
start_time = time.time()
for _ in range(iterations):
_ = torch.mm(tensor_a, tensor_b)
end_time = time.time()
avg_time = (end_time - start_time) / iterations
print(f"Average time for matrix multiplication of {size}x{size} tensors on {device_name}: {avg_time:.6f} seconds")
if __name__ == "__main__":
if torch.cuda.is_available():
num_gpus = torch.cuda.device_count()
print(f"Number of GPUs available: {num_gpus}")
for i in range(num_gpus):
benchmark(i)
else:
print("CUDA is not available. Running benchmark on CPU only.")
benchmark('cpu')
Benchmark Results
PC with one GPU (RTX 3090)
- Number of GPUs available: 1
- GPU (RTX 3090): Average time: 0.000008 seconds
- CPU: Average time: 0.918097 seconds
Server with 3 GPUs (RTX A6000):
- Number of GPUs available: 3
- GPU 1 (RTX A6000): Average time: 0.000015 seconds
- GPU 2 (RTX A6000): Average time: 0.000014 seconds
- GPU 3 (RTX A6000): Average time: 0.000014 seconds
- CPU: Average time: 0.693448 seconds
We also conducted similar MATLAB tests (multiply two large random matrixes) focusing on simple GPU operations:
MATLAB Results:
- RTX 3090: Computation time: 0.48609 seconds
- RTX A6000: Computation time: 1.0159 seconds
System Specifications
PC with one GPU (RTX 3090):
- CPU: 12 cores @ 3.5GHz
- GPU: NVIDIA GeForce RTX 3090 (24GB Memory)
- Ubuntu 20.04.6 LTS
- CUDA Version: 12.2
- Driver Version: 535.183.01
- python version: 3.9.7
- matlab version: 2021a
Server with 3 GPUs (RTX A6000)
- CPU: 2 sockets, 16 cores each @ 2.4GHz
- GPU: 3x NVIDIA RTX A6000 (48GB Memory each)
- Ubuntu 22.04.4 LTS
- CUDA Version: 12.2
- Driver Version: 535.183.01
- python version: 3.9.7
- matlab version: 2021a
The performance difference between the RTX A6000 and RTX 3090 is unexpected. We are seeking advice on:
- Potential bottlenecks or misconfigurations.
- Additional tests or diagnostics to perform.
- Any insights into why the RTX A6000 might underperform compared to the RTX 3090 in our use case.
Your expertise and suggestions would be immensely valuable in helping us resolve this issue. Thank you in advance for your assistance!
Best regards,
Zihan