Help with uncoalesced loads Data structure problem

Hello all! I’m having a few problems with uncoalesced loads from global memory and think it may be due to how my data is organized.

First of all, what I need is to send to my kernel N buffers of the same size.

Currently, what I have is something similar to:

typedef struct _tuple_soa {

	TupleAttr *attributes;

	int size;

} TupleSOA;

typedef struct _tuple_attr {

	disc_type type;

	union {

		int *buffer_int;

		float *buffer_float;

	} attr_data;

	int size;

} TupleAttr;

Where TupleAttr *attributes is of size N.

In my kernel I’ll be reading and writing to those buffer_int and buffer_float inside the TupleAttr struct.

I’m not sure if this is the right approach to represent my data in order to to have coalesced accesses…Should I try something different?

OKay, so after profiling a trying to understand it I’m kinda confused.

The way I’m accessing the memory is:

int i = in->attributes[id_attr].attr_data.buffer_int[id_thread];

out->attributes[0].attr_data.buffer_int[id_thread] = i;

What confuses me is that if I don’t write to the output buffer (out), I get no uncoalesced accesses. So why am I getting uncoalesced while write to the output?

Removing the output line probably caused the compiler to optimize out the kernel and not to do anything - therefore no uncoalesced accesses :)

your kernel probably didnt run at all.

eyal

That makes sense…Hmm, but anyway, I need to write to the output. Is there a better option than what I’m doing?