Convert Big Endian to Little Endian

I’ve copied a file to GPU and now I want to analyse it.
Threrefore I must convert some Data from Big to Little Endian.
On CPU I’ve the Macro

//a and b are bytes
#define MAKE_INT16( a,b ) signed short((a<<8) | unsigned char( b ))

This works fine.
But on GPU the unsigned command is ignored and the output can be wrong.

If I’ve the bytes 0x01FC for example, on CPU the output is 508 and on GPU it is -4.
How can I do this rigth?

if you want to coalesce memory access (hint: you do)

you can byteswap two shorts at the same time as one int32_t,


[codebox] ( (x>>8) & 0x00ff00ff ) | ( (x<<8) & 0xff00ff00 )[/codebox]

Yeah, that works!
But I think, it’s a compiler error, if a cast vom signed to unsigned is ignored.

I am unable to reproduce this issue with a recent (internal) compiler. See below for my test app and the resulting output. Does this problem reproduce with CUDA 3.1? If so, could you please post a minimal, self-contained repro case? Thanks.

[codebox]#include <stdio.h>

#include <stdlib.h>

#define MAKE_INT16(a,b) (signed short)((a<<8) | (unsigned char)b)

global void makeint16 (unsigned char a, unsigned char b, short *res)


*res = MAKE_INT16 (a, b);


int main (void)


short res;

short *res_d;

unsigned char a = 0x01;

unsigned char b = 0xfc;

printf (“a = %02x b = %02x\n”, (int)a, (int)b);

cudaMalloc((void**)&res_d, sizeof(res_d[0]));

makeint16<<<1,1>>>(a, b, res_d);

cudaMemcpy(&res, res_d, sizeof(res_d[0]), cudaMemcpyDeviceToHost); 

printf ("res = %d\n", res);



a = 01 b = fc

res = 508