Complex network and coalescing

Hi all,

I was wondering if there was any literature, or anyone has had any experience with memory layouts for complex networks with didactic (but also recurrent) organisation. I’ll explain my problem:

Each node in my network has many incoming and outgoing connections (fan in/fan out). This has left me with two options. Either organise the memory layout in favour of the fan in, or organise the memory layout in favour of the fan out. Regardless of my choice it looks like I will have to deal with one or the other in an uncoalesced fashion.

Just wondering if anyone knew anything, or had any out-of-the-box ideas?

Not knowing any specifics, here are a couple of very generic thoughts:

(1) Have you looked into buffering data in shared memory? This may allow you to read and write global memory with full coalescing, by performing all data swizzling as part of the shared memory accesses.

(2) Assuming the amount of data to be written is less than or equal to the amount of data read, performance work should focus on the global memory loads. To first order, writes to global memory can be treated as “fire & forget”.

Thanks for the reply. I am using shared mem where possible to perform “swizzling”, the problem lies in the initial global reads. I’ll try and elaborate a bit more. There is a memory layout for each node in the network and a layout for each connection. Now every iteration of my program I need to process the incoming connections to each node, and the outgoing connections from each node. The problem, you can hopefully see already is that each connection is both an incoming and an outgoing, meaning that I can only organise the connections in either a pre or post fashion. So for example, if I organised the connections in a pre fashion, then my reads for processing the pre section are coalesced, but my reads for the post process are not.