[H200] bug reporting

Hello, I am doing experiment with H200 to be familiar with it.

I think that I found some bugs in H200. Can anyone let me know the reason?

As far as I know the result should be 8.0 not 5.5 for bug 1 and 8.0 not 4.0 for bug 2.

Thanks for your help.

<Bug 1 Code details>

a = torch.tensor([1.0 * 2**(-9)] * 0 + [1.0 * 2**(-9)] * (2**21), dtype=torch.float32, device=‘cuda’)

b = torch.tensor([[1.0 * 2**(-9)] * 0 + [1.0 * 2**(-9)] * (2**21)]*16, dtype=torch.float32, device=‘cuda’)

a = a.to(torch.float8_e4m3fn)

a = a.unsqueeze(0)

b = b.to(torch.float8_e4m3fn)

result = torch._scaled_mm(

input = a,

mat2 = b.t(),

scale_a = torch.tensor(\[1.0\], dtype=torch.float32, device='cuda'),

scale_b = torch.tensor(\[1.0\], dtype=torch.float32, device='cuda'),

#bias = 

#scale_result=

#out_dtype=

use_fast_accum=True

)

print(“result:”, result[0][0])

<Bug 1 Printed result>

result: tensor(5.5000000000, device=‘cuda:0’, dtype=torch.float8_e4m3fn)

<Bug 2 Code details>

a = torch.tensor([1.0 * 2**(-9)] * 0 + [1.0 * 2**(-9)] * (2**21 + 2**4), dtype=torch.float32, device=‘cuda’)

b = torch.tensor([[1.0 * 2**(-9)] * 0 + [1.0 * 2**(-9)] * (2**21 + 2**4)]*16, dtype=torch.float32, device=‘cuda’)

a = a.to(torch.float8_e4m3fn)

a = a.unsqueeze(0)

b = b.to(torch.float8_e4m3fn)

result = torch._scaled_mm(

input = a,

mat2 = b.t(),

scale_a = torch.tensor(\[1.0\], dtype=torch.float32, device='cuda'),

scale_b = torch.tensor(\[1.0\], dtype=torch.float32, device='cuda'),

#bias = 

#scale_result=

#out_dtype=

use_fast_accum=True

)

print(“result:”, result[0][0])

<Bug 2 Printed result>

result: tensor(4., device=‘cuda:0’, dtype=torch.float8_e4m3fn)

You might get better help with pytorch questions by asking them on a pytorch forum, such as discuss.pytorch.org