Is this normal？ when I use the A30 GPU's MIG to run a torch model, the features obtained on 1g.6gb and 2g.12gb are inconsistent

1633038455 · July 16, 2024, 9:33am

Firstly I defined a simple convolutional network
class SimpleConvNet(nn.Module):
def init(self):
super(SimpleConvNet, self).init()
self.stn_fc1 = nn.Sequential(
nn.Linear(2*256, 512),
nn.BatchNorm1d(512),
nn.ReLU(inplace=True))
def forward(self, x):
batch_size, _, h, w = x.size()
x = x.view(batch_size, -1)
x = self.stn_fc1(x)
return x

Then, I used the same parameters and input, perform inference separately on 12g and 6g of VRAM to obtain the results output_12g.npy and output_6g.npy.

model_test = SimpleConvNet()
model_test.load_state_dict(torch.load(‘./model_params.pth’))
input_data = np.load(“./input_data.npy”)
input_data = torch.from_numpy(input_data)
input_data = input_data.cuda()
model_test = model_test.cuda()

##export CUDA_VISIBLE_DEVICES=‘MIG-29a86f08-9dda-59b4-a2b5-40f5dc21b648’
output_12g = model_test(input_data)
save_feature(output_12g,“./output_12g.npy”)

##export CUDA_VISIBLE_DEVICES=‘MIG-36832a9a-4921-540a-96b8-ba6ecc38e4e2’
output_6g = model_test(input_data)
save_feature(output_6g,“./output_6g.npy”)

I compared all positions of two features and found inconsistencies at the sixth decimal place in many indices. Is this normal? As shown below：

Inconsistent element at index (0, 1):
output_12g_cpu: 0.7465695738792419
output_6g_cpu: 0.7465693950653076
Inconsistent element at index (0, 2):
output_12g_cpu: 1.231195092201233
output_6g_cpu: 1.231195330619812
Inconsistent element at index (0, 4):
output_12g_cpu: 0.314302921295166
output_6g_cpu: 0.3143029808998108
Inconsistent element at index (0, 5):
output_12g_cpu: 1.0248600244522095
output_6g_cpu: 1.0248603820800781
Inconsistent element at index (0, 6):
output_12g_cpu: 1.4555572271347046
output_6g_cpu: 1.4555573463439941

1633038455 · July 17, 2024, 6:17am

Is this problem caused by computational power?

Topic		Replies	Views
Is it normal that inference results diff before MIG and after MIG? cuDNN	1	12	July 16, 2024
How can I modify the size of a NVIDIA Multi-Instance GPU (MIG) partition at runtime? CUDA Setup and Installation cuda , gpu	0	13	March 3, 2025
Regarding the issue of the GPU compute power test results being significantly lower than expected GPU-Accelerated Libraries	1	13	August 22, 2024
Inference Speed Jetson Xavier NX pytorch	6	868	April 12, 2023
Torch Inference slows down after a few iterations Jetson Nano pytorch	4	615	March 2, 2022
Kernel only runs fast with >512 threads, regardless of what they are actually doing CUDA Programming and Performance	3	1854	May 19, 2009
GPU does not work why? Legacy PGI Compilers	9	15422	March 5, 2010
[cuDNN bug] Different outputs with Same input (cuDNN7.1.3) cuDNN	0	578	September 26, 2018
Unexpected Performance Discrepancy Between RTX A6000 and RTX 3090 GPU - Hardware gpu	1	173	September 20, 2024
900-21001-0300-030 A100 40G test test result CUDA Programming and Performance	6	78	October 18, 2024

Is this normal？ when I use the A30 GPU's MIG to run a torch model, the features obtained on 1g.6gb and 2g.12gb are inconsistent

Related topics