Is there a reference table anywhere, where one can take the LUT value from a SASS dumped LOP3.LUT instruction and see what the actual logic ops are?
Thanks,
Is there a reference table anywhere, where one can take the LUT value from a SASS dumped LOP3.LUT instruction and see what the actual logic ops are?
Thanks,
Any number of different sequences of logic operations can map to the same truth table, i.e. this is a many-to-one mapping, not a bijection. Therefore one cannot unambiguously map a truth-table value back to a particular sequence of logic operations.
But one can certainly set up a logic equation corresponding to a LOP3.LUT truth table (see code below) and then transform that into any particular form (e.g. DNF, CNF, NAND) desired.
/* emulate GPU's LOP3.LUT (three-input logic op with 8-bit truth table) */
uint32_t lop3_fast (uint32_t a, uint32_t b, uint32_t c, uint8_t ttbl)
{
uint32_t r = 0;
if (ttbl & 0x01) r |= ~a & ~b & ~c;
if (ttbl & 0x02) r |= ~a & ~b & c;
if (ttbl & 0x04) r |= ~a & b & ~c;
if (ttbl & 0x08) r |= ~a & b & c;
if (ttbl & 0x10) r |= a & ~b & ~c;
if (ttbl & 0x20) r |= a & ~b & c;
if (ttbl & 0x40) r |= a & b & ~c;
if (ttbl & 0x80) r |= a & b & c;
return r;
}
Thanks, I should have seen that, (the many to one).
I’m trying to optimise and the LOP3’s are making things somewhat opaque.
For the record: Intel offer an AVX512 instruction “vpternlog” performing the same function as LOP3. They publish a couple of tables showing all 256 logic op combinations.
Currently it is at the beginning of “Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 2C: Instruction Set Reference, V-Z” available here.
@rs277 The canonicalization rules used to generate tables 5-1 and 5-2 in the Intel manual are not clear. Other than for the simplest cases using OR and AND, I am not sure how these tables are particularly useful? I do not usually think in terms of NAND, NOR, XNOR.
For the forward direction (3-input logic formula to 8-bit truth table) there is this, of course:
a = 0xf0;
b = 0xcc;
c = 0xaa;
printf ("%02x\n", <logic-formula-using-inputs-a-b-c>);
No dark secrets here, and this is also documented in the PTX manual.
After spending some more time looking into this, mapping arbitrary logic (of, say, 5 variables) in an optimal fashion to LOP3
instructions is a very hard problem indeed.
No, and I’ve refered to that PTX manual formula before.
For me, it’s more been a case, when examining SASS with a view to optimising, knowing what the boolean function being used is handy to know.
I found a more readable version of the same thing here.