Hi,when I was testing torch.from_numpy(input).float().cuda(), I found that the performance deteriorated seriously at one or several times after multiple runs. What is the reason?
The test code is as follows(To avoid other effects, I added several more syncs):
import numpy as np
import torch
import time
def torch_from_numpy_test(
input
):
torch.cuda.synchronize()
T1 = time.perf_counter()
input_torch = torch.from_numpy(input).float().cuda()
torch.cuda.synchronize()
T2 = time.perf_counter()
print('torch.from_numpy time: %s ms' % ((T2 - T1)*1000))
out = input_torch.cpu().numpy().astype(np.float32)
torch.cuda.synchronize()
return out
random_array = np.random.rand(512, 384, 18).astype(np.float32)
def funA():
all_peaks = []
for part in range(18):
map_ori = random_array[:, :, part]
torch.cuda.synchronize()
one_heatmap = torch_from_numpy_test(map_ori)
torch.cuda.synchronize()
all_peaks.append(one_heatmap)
print(all_peaks[0][32][32])
funA()
The performance of the test is as follows:
torch.from_numpy time: 7.923923432826996 ms
torch.from_numpy time: 0.6886869668960571 ms
torch.from_numpy time: 0.5026236176490784 ms
torch.from_numpy time: 0.4269257187843323 ms
torch.from_numpy time: 0.42704492807388306 ms
torch.from_numpy time: 0.4085637629032135 ms
torch.from_numpy time: 0.3131367266178131 ms
torch.from_numpy time: 0.30875951051712036 ms
torch.from_numpy time: 0.30804798007011414 ms
torch.from_numpy time: 0.29971450567245483 ms
torch.from_numpy time: 0.29381364583969116 ms
torch.from_numpy time: 0.2989359200000763 ms
torch.from_numpy time: 0.7551312446594238 ms
torch.from_numpy time: 0.29600411653518677 ms
torch.from_numpy time: 0.292457640171051 ms
torch.from_numpy time: 75.8533850312233 ms
torch.from_numpy time: 0.6768964231014252 ms
torch.from_numpy time: 0.5467236042022705 ms
You can see that the performance of the third to last time has deteriorated a lot.
Interference from CPU frequency and GPU frequency has been ruled out. The Nvidia GPU models I tested are L20/L40/Tesla V100.