**DISCLAIMER**: I know this question could be a bit **off topic**, not being directly related to a NVIDIA product. Nevertheless, I hope someone among you, being an expert in the field of AI, could give me a suggestion about how to tackle this problem. I apologise for this, but I’d really love some advice from an expert!

Now to the real question.

I am facing an issue in properly understanding which kind of ML architecture to adopt in an academic example.

The physical problem involves fluid dynamics, in a “system of pipes”. There exist ad-hoc numerical methods for solving these problems, which I use to get the (output) data with which later on training the network. The “problem” stands in the (dimensional) non-conformity between input and output. The input is made up of some data related to the “system discretization & characteristics”, which is basically the input used by these models to run. It can be put in either 1d or 2d-tensor format, where in the latter input data is simply grouped for each “segment”.

The output, instead, changes a bit. While sharing the same base system discretization, it “augments” it, in the sense that each base-segment gets divided into a different number of sub-segments. Additionally, it depends on a time discretization (-> dynamics). So, to give a concrete example to further clarify, say the input might be (5, 17) (5 base information for each of the 17 base-segments), while the output (355, 100) (where 355 are the total number of sub-segments, 100 the n. of time steps). Now, I have been using, for “inheritance issues”, both 1d and 2d Dense-Encoder-Decoder CNNs. From the literature I have been able to retrieve, I did get that those (specially 2d) are highly suited when dealing with (2d) images, or with problems which can be traslated to a “state-image” counterpart, where each pixel has a spatial correlation with neighbouring ones. In this case, I really cannot see how this can happen. There is some sort of “spatial coherence” between each of such base-segments, but clearly not the same as an image pixels’.

Now the question: hoping I was able to at least give the idea of the problem, **what would you suggest in terms of architecture type when dealing with these kind of problems**?

Thanks to everyone.

Cheers :)