Typecast of &char[x] to uint32_t* varies with use of temp variable

I noticed an issue while debugging kernel code and I have narrowed it to the following.

int idx = blockDim.x * blockIdx.x + threadIdx.x; // idx is the Thread ID

char buf[]  = "abcdefghijkl"; // Temporary Variable

        inputBuffer = buf; // inputBuffer & outputBuffer are of type char*

// NOTE: Endian : LITTLE

        if ( idx == 0 ) // only Thread 0

        {

            uint32_t    threeBytes = *((uint32_t*) &inputBuffer[3]); // Temporary Variable

            outputBuffer[0] = threeBytes & 0xFF; // OUTPUT: a (This is WRONG)

        }

        else if ( idx == 1 ) // only Thread 1

        {

            outputBuffer[1] = (*((uint32_t*) &inputBuffer[3])) & 0xFF; // OUTPUT: d (This is CORRECT)

        }

After executing the above code, I am left with outputBuffer[0] with ‘a’ and outputBuffer[1] with ‘d’. The only difference in them is the use of a temporary variable.

Any idea on why such a difference and what goes on underneath ?

Also, Instead of using the temporary variable ‘buf’, if we assign the value “abcdefghijkl” to inputBuffer directly then both outputBuffer[0] and outputBuffer[1] have the same value ‘d’. This is again wierd, what changed ?

Additional details from PTX code,

.version 1.4

    .target sm_12, map_f64_to_f32

    // compiled with /usr/local/cuda/open64/lib//be

    // nvopencc 3.1 built on 2010-06-08

//-----------------------------------------------------------

    // Compiling /tmp/tmpxft_0000461d_00000000-7_gpu1.cpp3.i (/tmp/ccBI#.aP0shD)

    //-----------------------------------------------------------

//-----------------------------------------------------------

    // Options:

    //-----------------------------------------------------------

    //  Target:ptx, ISA:sm_12, Endian:little, Pointer Size:32

    //  -O0 (Optimization level)

    //  -g2 (Debug level)

    //  -m2 (Report advisories)

    //-----------------------------------------------------------