How to replace STL types

Hi.

I was just wondering how to replace STL types like std::vector and std::list the best way.

To be a little bit more specific, I have sth. like

std::vector<std::vector<std::list<struct>>>

Is there some “golden rule” how to transform such constructs to get them in other data types, so that I can have than in global memory and can do work with CUDA on the data?

Here’s even more information about my construct…

Any help and any hint is very appreciated.

Thanks!

Hi.

I was just wondering how to replace STL types like std::vector and std::list the best way.

To be a little bit more specific, I have sth. like

std::vector<std::vector<std::list<struct>>>

Is there some “golden rule” how to transform such constructs to get them in other data types, so that I can have than in global memory and can do work with CUDA on the data?

Here’s even more information about my construct…

Any help and any hint is very appreciated.

Thanks!

The closest thing to a golden rule is: “CUDA works best with contiguous reads of flat arrays of simple data types.” Applying this rule leads to several data structure decisions:

  • N-dimensional arrays should be represented as linear arrays with a known strides for each dimension. Whether row-major vs. column major order is better depends on your access pattern.

  • A struct of arrays is better than an array of structs. This ensures that when threads all access a particular struct member, the access is coalesced by the memory controller.

For Fermi-class devices with an L1 and L2 cache, violating the “struct of arrays” guideline is not as big of a deal because the cache will read contiguous chunks of memory even if the threads do not.

Unfortunately, it sounds like you have 2D array of variable length lists where the element is a struct. This is going to be hard to reconcile with the golden rule. What is the data access pattern like? How will threads in the same block vs. different blocks interact with this data structure?

The closest thing to a golden rule is: “CUDA works best with contiguous reads of flat arrays of simple data types.” Applying this rule leads to several data structure decisions:

  • N-dimensional arrays should be represented as linear arrays with a known strides for each dimension. Whether row-major vs. column major order is better depends on your access pattern.

  • A struct of arrays is better than an array of structs. This ensures that when threads all access a particular struct member, the access is coalesced by the memory controller.

For Fermi-class devices with an L1 and L2 cache, violating the “struct of arrays” guideline is not as big of a deal because the cache will read contiguous chunks of memory even if the threads do not.

Unfortunately, it sounds like you have 2D array of variable length lists where the element is a struct. This is going to be hard to reconcile with the golden rule. What is the data access pattern like? How will threads in the same block vs. different blocks interact with this data structure?

Thanks for your reply!

Well the access pattern will look sth. like this:

  • the first two std::vectors are like the x/y-coordinates of a point

  • the std:llist is a list of neighbors to the current point

  • and in the struct there are several properties of each neighbor

At the moment, the calculation is done for each point after another. I want to do that in parallel with CUDA.

So… each thread should access a point and do some calculations for all the neighbors of that particular point.

I hope this helps.

But how to represent the data in CUDA?

Should i have an array of structs with an array of structs and so on?

I can’t see another option at that point. Good to hear, that the “golden rule” isn’t that important on Fermi, because I’m on that platform.

Sth. like this, for example:

struct tableEntry 

{

	unsigned int index;

	float weight;

};

struct list

{

	tableEntry *tE;

};

struct vectorY

{

	list *l;

};

struct vectorX

{

	vectorY *vY;

} *vX;

Doesn’t look very satisfacting…

The other thing is, I have to figure out, how much mem to allocate for each struct, because not every point has the exact same number of neighbors… well I think I’ll have to pad here…

Any (better) ideas?

Thanks for your reply!

Well the access pattern will look sth. like this:

  • the first two std::vectors are like the x/y-coordinates of a point

  • the std:llist is a list of neighbors to the current point

  • and in the struct there are several properties of each neighbor

At the moment, the calculation is done for each point after another. I want to do that in parallel with CUDA.

So… each thread should access a point and do some calculations for all the neighbors of that particular point.

I hope this helps.

But how to represent the data in CUDA?

Should i have an array of structs with an array of structs and so on?

I can’t see another option at that point. Good to hear, that the “golden rule” isn’t that important on Fermi, because I’m on that platform.

Sth. like this, for example:

struct tableEntry 

{

	unsigned int index;

	float weight;

};

struct list

{

	tableEntry *tE;

};

struct vectorY

{

	list *l;

};

struct vectorX

{

	vectorY *vY;

} *vX;

Doesn’t look very satisfacting…

The other thing is, I have to figure out, how much mem to allocate for each struct, because not every point has the exact same number of neighbors… well I think I’ll have to pad here…

Any (better) ideas?

Take a look at Thrust’s bucket_sort_2d example [1], which constructs a similar data structure.

[1] http://code.google.com/p/thrust/source/browse/examples/bucket_sort2d.cu

Take a look at Thrust’s bucket_sort_2d example [1], which constructs a similar data structure.

[1] http://code.google.com/p/thrust/source/browse/examples/bucket_sort2d.cu