No ideas, I suspect. Generally speaking, double nested variable length arrays are a bad idea in CUDA since it is hard to access them in a coalesced way. If there is a way to flatten your data structure, it will be much easier to work with on the GPU.

Aside from the host/device pointer issue which caused the crash, this is still a bad idea. In fact, it’s exactly the same as the vector version but without the memory leak protection the STL provides. You do not want to go pointer chasing on the GPU. You want to have one big, flat, 1D array, and compute indices into that.

Okay i understand the problem, but how should i solve tis problem?
And my second question, apart from that it is a bad idea, doesn’t arrays of structures work, if in the structures are arrays?
I am new in CUDA and it looks to me that I haven’t understand many things yet :D

You basically have two, 1D arrays. One for ‘solutions’ and the other for ‘values.’ You index into these separately, with the 1D indices computed as if from the original 2D index. The matrix multiplication example in the Programming Guide shows you how to do this. In that case the matrices are allocated as 1D arrays, with computed 2D indices.

As a side point, this is what you should be doing on the CPU anyway - if you want maximum performance. The CPU caches help smooth off the edges, but they can’t do everything. I say this while glaring at the current problem in my inbox, which surrounds something like array[i][j][k].structure, where structure is 264 bytes long…