Reverse LUT for LOP3.LUT

rs277 · January 18, 2020, 7:15pm

Is there a reference table anywhere, where one can take the LUT value from a SASS dumped LOP3.LUT instruction and see what the actual logic ops are?

Thanks,

njuffa · January 18, 2020, 7:27pm

Any number of different sequences of logic operations can map to the same truth table, i.e. this is a many-to-one mapping, not a bijection. Therefore one cannot unambiguously map a truth-table value back to a particular sequence of logic operations.

But one can certainly set up a logic equation corresponding to a LOP3.LUT truth table (see code below) and then transform that into any particular form (e.g. DNF, CNF, NAND) desired.

/* emulate GPU's LOP3.LUT (three-input logic op with 8-bit truth table) */
uint32_t lop3_fast (uint32_t a, uint32_t b, uint32_t c, uint8_t ttbl)
{
    uint32_t r = 0;
    if (ttbl & 0x01) r |= ~a & ~b & ~c;
    if (ttbl & 0x02) r |= ~a & ~b &  c;
    if (ttbl & 0x04) r |= ~a &  b & ~c;
    if (ttbl & 0x08) r |= ~a &  b &  c;
    if (ttbl & 0x10) r |=  a & ~b & ~c;
    if (ttbl & 0x20) r |=  a & ~b &  c;
    if (ttbl & 0x40) r |=  a &  b & ~c;
    if (ttbl & 0x80) r |=  a &  b &  c;
    return r;
}

rs277 · January 18, 2020, 7:33pm

Thanks, I should have seen that, (the many to one).

I’m trying to optimise and the LOP3’s are making things somewhat opaque.

rs277 · December 30, 2023, 4:20am

For the record: Intel offer an AVX512 instruction “vpternlog” performing the same function as LOP3. They publish a couple of tables showing all 256 logic op combinations.

Currently it is at the beginning of “Intel® 64 and IA-32 Architectures Software Developer’s Manual
Volume 2C: Instruction Set Reference, V-Z” available here.

njuffa · December 30, 2023, 8:33pm

@rs277 The canonicalization rules used to generate tables 5-1 and 5-2 in the Intel manual are not clear. Other than for the simplest cases using OR and AND, I am not sure how these tables are particularly useful? I do not usually think in terms of NAND, NOR, XNOR.

For the forward direction (3-input logic formula to 8-bit truth table) there is this, of course:

    a = 0xf0;
    b = 0xcc;
    c = 0xaa;
    printf ("%02x\n", <logic-formula-using-inputs-a-b-c>);

No dark secrets here, and this is also documented in the PTX manual.

After spending some more time looking into this, mapping arbitrary logic (of, say, 5 variables) in an optimal fashion to LOP3 instructions is a very hard problem indeed.

rs277 · December 30, 2023, 8:58pm

No, and I’ve refered to that PTX manual formula before.
For me, it’s more been a case, when examining SASS with a view to optimising, knowing what the boolean function being used is handy to know.

I found a more readable version of the same thing here.

Topic		Replies	Views
What does LOP3.LUT mean? How is it executed? CUDA Programming and Performance	22	4421	February 8, 2024
Is it a good idea to convert all logical operators into bitwise operators to stop short-circuiting for better warp divergence? CUDA Programming and Performance	4	65	March 3, 2025
LOP3 Throughput CUDA Programming and Performance	1	1385	July 26, 2019
Is it possible to use logic gates in warp level primitives? CUDA Programming and Performance	5	522	September 6, 2021
Throughput for certain integer arithmetic instructions. CUDA Programming and Performance	5	1761	January 15, 2020
Ampere SASS Annotation CUDA Programming and Performance	5	1840	May 1, 2021
Efficient implementation of bitwise majority-of-N operations for N in {3, 5, 7, 9} CUDA Programming and Performance	6	324	December 23, 2023
Scaling on different architectures CUDA Programming and Performance	8	689	April 29, 2021
What does LOP.AND.NZ do? CUDA Programming and Performance	13	1274	December 16, 2020
Shared memory bank conflicts and nsight metric CUDA Programming and Performance	15	5518	October 19, 2024

Reverse LUT for LOP3.LUT

Related topics