Torch.from_numpy().cuda() performance anomaly

xiaohongchen515 · May 9, 2025, 11:12am

Hi，when I was testing torch.from_numpy(input).float().cuda(), I found that the performance deteriorated seriously at one or several times after multiple runs. What is the reason?

The test code is as follows(To avoid other effects, I added several more syncs)：

import numpy as np
import torch
import time

def torch_from_numpy_test(
    input
):
    torch.cuda.synchronize()
    T1 = time.perf_counter()
    input_torch = torch.from_numpy(input).float().cuda()
    torch.cuda.synchronize()
    T2 = time.perf_counter()
    print('torch.from_numpy time: %s ms' % ((T2 - T1)*1000))
    out = input_torch.cpu().numpy().astype(np.float32)
    torch.cuda.synchronize()       
    return out

random_array = np.random.rand(512, 384, 18).astype(np.float32)
def funA():
    all_peaks = []
    for part in range(18):
        map_ori = random_array[:, :, part]
        torch.cuda.synchronize()
        one_heatmap = torch_from_numpy_test(map_ori)
        torch.cuda.synchronize()
        all_peaks.append(one_heatmap)
    print(all_peaks[0][32][32])
funA()

The performance of the test is as follows：

torch.from_numpy time: 7.923923432826996 ms
torch.from_numpy time: 0.6886869668960571 ms
torch.from_numpy time: 0.5026236176490784 ms
torch.from_numpy time: 0.4269257187843323 ms
torch.from_numpy time: 0.42704492807388306 ms
torch.from_numpy time: 0.4085637629032135 ms
torch.from_numpy time: 0.3131367266178131 ms
torch.from_numpy time: 0.30875951051712036 ms
torch.from_numpy time: 0.30804798007011414 ms
torch.from_numpy time: 0.29971450567245483 ms
torch.from_numpy time: 0.29381364583969116 ms
torch.from_numpy time: 0.2989359200000763 ms
torch.from_numpy time: 0.7551312446594238 ms
torch.from_numpy time: 0.29600411653518677 ms
torch.from_numpy time: 0.292457640171051 ms
torch.from_numpy time: 75.8533850312233 ms
torch.from_numpy time: 0.6768964231014252 ms
torch.from_numpy time: 0.5467236042022705 ms

You can see that the performance of the third to last time has deteriorated a lot.

Interference from CPU frequency and GPU frequency has been ruled out. The Nvidia GPU models I tested are L20/L40/Tesla V100.

Robert_Crovella · May 9, 2025, 1:32pm

I usually suggest that people asking questions about pytorch ask on a pytorch forum such as discuss.pytorch.org There are NVIDIA experts on that forum.

Topic		Replies	Views
Uncoherent timing of convolution using CUDA events CUDA Programming and Performance	3	340	July 27, 2023
Bfloat16 has worse performance than float16 for conv2d in Pytorch CUDA Programming and Performance cuda , kernel , pytorch , python	4	2888	July 6, 2022
Gpu tesla t4 suddenly has slow processing no more that 1 % solved after reboot it CUDA Programming and Performance cuda , kernel , python , gpu-computing	2	562	June 24, 2024
My desktop freezes at random times while training with pytorch Frameworks cuda , ubuntu , pytorch	3	1572	March 11, 2024
Torch Tensor.cuda() very slow Jetson TX2 pytorch	6	3284	October 18, 2021
Oscilating performance, Code total times variates CUDA Programming and Performance	10	10572	June 21, 2009
CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE in Python CUDA Programming and Performance	11	9615	May 16, 2024
cudaMemcpy3D Incorrect returned values CUDA Programming and Performance	0	1546	February 3, 2009
The time spent on cudaMemcpy when copying data from CPU to GPU occasionally fluctuates significantly CUDA Programming and Performance	5	47	October 31, 2024
cudaMemcpy is slow the first time used in a loop CUDA Programming and Performance cuda	3	1690	October 12, 2021

Torch.from_numpy().cuda() performance anomaly

Related topics