Handling structures with bitfields

I’m dealing with point clouds in LAS format as defined here ( http://www.asprs.org/wp-content/uploads/2010/12/LAS_1_4_r13.pdf ) [Table 12]

This translates into a struct like so:

long           mX;
long           mY;
long           mZ;
unsigned short mIntensity;
unsigned int   mReturnNumber : 3;     // these 4 fields form 1 byte
unsigned int   mNumberOfReturns : 3;  //
unsigned int   mScanDirection : 1;    //
unsigned int   mEdgeOfLine : 1;       //
unsigned char  mClassification;
char           mScanAngleRank;
unsigned char  mChannel;
unsigned short mSourceId;
double         mGpsTime;
unsigned short mRed;
unsigned short mGreen;
unsigned short mBlue;</code>

I need to process billions of these structs through CUDA, but CUDA doesn’t support the #pragma pack necessary to keep that structure at the correct size of 34 bytes.

Is there any way of keeping this data in its native format for processing in CUDA, or will there have to be a non-CUDA conversion step to add and remove the padding it seems to require? If so, this will greatly slow the whole process down, and we’re going for speed here, obviously.

seems like you are on windows? Are you certain those bit fields are not already packed by windows?

windows has some structure alignment rules that require careful study. Your structure, as is, compiles to 48 bytes for me, on Windows (MS VC++, CUDA 8). It is not anywhere near 34 bytes, and the discrepancy is not due to the bit field handling. For example, removing the first 3 long quantities reduces the structure size by 16 bytes. Furthermore, if I use #pragma pack(1), the structure size ends up at 37 bytes, which makes sense to me - each bitfield occupies a byte.

are you allowed to modify the structure at all?

For example, defining the structure this way on linux will pack the bitfields:

struct test {
long           mX;
long           mY;
long           mZ;
unsigned short mIntensity;
unsigned int   mReturnNumber : 3;     // these 4 fields form 1 byte
unsigned int   mNumberOfReturns : 3;  //
unsigned int   mScanDirection : 1;    //
unsigned int   mEdgeOfLine : 1;       //
unsigned char  mClassification;
char           mScanAngleRank;
unsigned char  mChannel;
unsigned short mSourceId;
double         mGpsTime;
unsigned short mRed;
unsigned short mGreen;
unsigned short mBlue;
} __attribute__ ((__packed__));

On linux, the sizeof(test) as above is 46 bytes, which is fully packed and correct, considering that sizeof(long) is 8 bytes on linux. Removing the 12 extra bytes this contributes would reduce it to the 34 bytes you mention. But it’s not obvious to me that you can get there on windows, without some more information from your side on the exact struct definition.

My advice, based on experience gathered over 25 years: Avoid bitfields. Instead use defined-sized integer types like uint8_t, uint16_t etc as appropriate, and insert and extract bits by suitable inline functions or macros. Note that the LAS specification itself does not list bitfields as a supported data type, instead if defines various supported data types comprising 1, 2, 4, or 8 bytes (page 4).

The guarantees provided by the C++ standard are often insufficient for what people need in situations where bitfields appear to be an appropriate solution. Which is exactly what you found out here. With few exceptions, #pragma is used for implementation-defined language elements. Use of #pragma is therefore best avoided if possible.

Substitute int32_t for the longs and uint16_t for the unsigned shorts, to make it clear what the byte size of all the other fields are.

But it was the bitfield variables that are the problem (as marked with the comment). To have them take up only the requested bits, the #pragma pack command is required, which CUDA doesn’t support (says as much when attempting to compile).

For now, I’m going to see if using #pragma align(1) will prevent any padding with the four bitfield variables replaced with a single uint8_t that I’ll manually bit-pack to replicate the originals. that’s the only hope I see right now of using this structure as-is, which I need to. It’s an industry standard format.

If that fails, it’s down to having the data classed as a raw byte array and I manually insert everything.

The LAS specification does not actually use bit fields, as it does not support bit fields as a data type. What the specification shows where you currently use bit fields is bytes, and the meaning of individual bits within those bytes. My suggestion is to handle the data exactly as the standard describes. Retrieve the full byte (I.e. uint8_t), then extract the desired bits by shift and mask. Analogous when inserting bits.

I have dealt with very similar use case before, e.g. parsing and manipulating headers for various layers of network traffic. My experience again and again has been that bit fields are not the way to go.

I wrote my last reply before I saw your post, @njuffa. As I said, I am avoiding the bitfields currently. There’s still padding going on in CUDA when this struct is used, however. At least the size is wrong. I’ll find the problem before long, though. So long as CUDA will accept a 34-byte stuct without padding, somehow.

I didn’t have any trouble compiling with #pragma pack on CUDA on windows.

However, I couldn’t get my structure down to 34 bytes, no matter what I did with #pragma pack, which is why I asked for an exact example that achieves this on windows (with or without CUDA).

Sorry, it seems I misunderstood the focus of the question. Calling out “struct with bitfields” in the subject line somehow suggested to me that the issue of interest is bit fields.

Caveat: ‘long’ must be avoided, since it is a 32-bit type on 64-bit Windows and a 64-bit type on 64-bit Linux. Non-portable. You used int32_t instead, which is a good substitution.

Because GPUs (like some CPUs), require all data to be naturally aligned, you cannot get the packing you desire. Look at the offsets relative to the start of the struct, which itself will be 8-byte aligned based on the largest constituent element (I think, double check)

+  0 long           mX;
+  4 long           mY;
+  8 long           mZ;
+ 12 unsigned short mIntensity;
+ 14 uint8_t        misc_bits; // mReturnNumber, mNumberOfReturns, mScanDirection, mEdgeOfLine 
+ 15 unsigned char  mClassification;
+ 16 char           mScanAngleRank;
+ 17 unsigned char  mChannel;
+ 18 unsigned short mSourceId;
+ 20 double         mGpsTime;   // <<<<<<< eek!  we need 8-byte alignment, but have 4-byte alignment
unsigned short mRed;
unsigned short mGreen;
unsigned short mBlue;

The CUDA compiler must pad before mGpsTime so it can be accessed according to the hardware restrictions, otherwise undefined behavior will result (access will return garbage data).

Under the aspect of portability, this is a poorly designed struct (all the world is not an x86 platform with support for unaligned loads). Since you cannot change the specification, you will need to use a byte array and copy out the data appropriately. Or (this may be considered hacky by some), replace

double mGpsTime;

with

int32_t mGpsTime_low;
int32_t mGpsTime_high;

and then use CUDA’s __hiloint2double() intrinsic to get the proper double value.

good catch, I missed the double alignment mismatch

To follow up on this, I did get CUDA to handle the class with the correct size and packing as so:

#pragma pack(push,1)
class PointsType // 34 bytes
{
public:
 /*constructors, etc here*/
	int32_t        mX;
	int32_t        mY;
	int32_t        mZ;
	uint16_t       mIntensity;
	//unsigned int   mReturnNumber : 3;
	//unsigned int   mNumberOfReturns : 3;
	//unsigned int   mScanDirection : 1;
	//unsigned int   mEdgeOfLine : 1;
	uint8_t        mMulti;
	uint8_t        mClassification;
	int8_t         mScanAngleRank;
	uint8_t        mChannel;
	uint16_t       mSourceId;
 	double         mGpsTime;
	uint16_t       mRed;
	uint16_t       mGreen;
	uint16_t       mBlue;
};
#pragma pack(pop)

And to access the data that was in the bitfields, it uses the mMulti variable with such masking and bit shifting as required.

So while the error suggested to me that the #pragma pack() was at fault, it was actually only the combination of #pragma pack and the bitfields.

Thank you, everyone, for your assistance. It always helps to bounce problems off others when stuck. :)

Are you using this structure successfully in device code? Unless my count is off, ‘double mGpsTime’ is still at an offset of 20 bytes relative to the starting address of the struct, which means it cannot be guaranteed to be naturally aligned (i.e. to an 8-byte boundary), which is required to avoid undefined behavior when accessing this data on the device.

The kernel is building this structure as its output and, once written to a file with an appropriate header, it formed a valid point cloud with no errors when viewed with a third-part reader. So if the size was off, it would have failed to load. I did not check the value of the GPS time field, so there’s a possibility that the size was right but those last fields weren’t filled in correctly. I’ll double-check that when I return to the project. If that is a problem after all, I’ll probably split the double as suggested above to force alignment.