Torch.from_numpy().cuda() performance anomaly

Hi,when I was testing torch.from_numpy(input).float().cuda(), I found that the performance deteriorated seriously at one or several times after multiple runs. What is the reason?

The test code is as follows(To avoid other effects, I added several more syncs):

import numpy as np
import torch
import time

def torch_from_numpy_test(
    input
):
    torch.cuda.synchronize()
    T1 = time.perf_counter()
    input_torch = torch.from_numpy(input).float().cuda()
    torch.cuda.synchronize()
    T2 = time.perf_counter()
    print('torch.from_numpy time: %s ms' % ((T2 - T1)*1000))
    out = input_torch.cpu().numpy().astype(np.float32)
    torch.cuda.synchronize()       
    return out

random_array = np.random.rand(512, 384, 18).astype(np.float32)
def funA():
    all_peaks = []
    for part in range(18):
        map_ori = random_array[:, :, part]
        torch.cuda.synchronize()
        one_heatmap = torch_from_numpy_test(map_ori)
        torch.cuda.synchronize()
        all_peaks.append(one_heatmap)
    print(all_peaks[0][32][32])
funA()

The performance of the test is as follows:

torch.from_numpy time: 7.923923432826996 ms
torch.from_numpy time: 0.6886869668960571 ms
torch.from_numpy time: 0.5026236176490784 ms
torch.from_numpy time: 0.4269257187843323 ms
torch.from_numpy time: 0.42704492807388306 ms
torch.from_numpy time: 0.4085637629032135 ms
torch.from_numpy time: 0.3131367266178131 ms
torch.from_numpy time: 0.30875951051712036 ms
torch.from_numpy time: 0.30804798007011414 ms
torch.from_numpy time: 0.29971450567245483 ms
torch.from_numpy time: 0.29381364583969116 ms
torch.from_numpy time: 0.2989359200000763 ms
torch.from_numpy time: 0.7551312446594238 ms
torch.from_numpy time: 0.29600411653518677 ms
torch.from_numpy time: 0.292457640171051 ms
torch.from_numpy time: 75.8533850312233 ms
torch.from_numpy time: 0.6768964231014252 ms
torch.from_numpy time: 0.5467236042022705 ms

You can see that the performance of the third to last time has deteriorated a lot.

Interference from CPU frequency and GPU frequency has been ruled out. The Nvidia GPU models I tested are L20/L40/Tesla V100.

I usually suggest that people asking questions about pytorch ask on a pytorch forum such as discuss.pytorch.org There are NVIDIA experts on that forum.