Here is my code of running an ViT inferrence.
for i in range(100):
s_time0 = time.time()
image = Image.open(image_file).convert('RGB')
print('open img time:',int((time.time() - s_time0) * 1000))
s_time = time.time()
image_tensor = process_anyres_image(
image, model.image_processor, grid_points, False, False
)
print('process time:',int((time.time() - s_time0) * 1000))
s_time = time.time()
image_tensor = torch.from_numpy(image_tensor)
print('array to tensor time:',int((time.time() - s_time0) * 1000))
s_time = time.time()
image_tensor = image_tensor.to('cuda', dtype=torch.float16)
print('to gpu time:',int((time.time() - s_time0) * 1000))
s_time = time.time()
tokens = model(image_tensor) # torch.Size([1, 3, 224, 224])
endtime = time.time()
print('forward time:',int((endtime - s_time) * 1000))
print('total:',int((endtime - s_time0) * 1000))
print('-----------------------')
And the output is like this.
It seems like tensor.to(‘cuda’) cost much time except first two times.
Here is the information of machine.
And version of pytorch.
![image](https://global.discourse-cdn.com/nvidia/original/4X/2/7/6/276daa6e55277d2018df6b2141bd7a6291d9485b.png)
Is it normal?