Hello, I’m trying to optimize 1-bit tensor heavy code on a RTX 6000 Ada and now that the L40S was introduced, I’m wondering if deployment differences might crop up between these three GPUs.
For those of you that haven’t looked closely, the L40 and L40S data sheets strongly imply that the L40S tensor cores are twice as fast as the L40 except for INT4 (weird). And INT1 tensor performance is left unspecified for all three GPUs.
What’s going on? Or more directly, is the INT1 performance the same for L40/L40S/6000 Ada? Or is one significantly faster/slower than another?