Memory handling changes since 2.x

I am trying to port some codes with complicated trees stored in vector arrays and other nasty things from cuda 2 and 3 (it still uses cutil). Original codes were also written under linux, to make it easier.
The problem is they all crash with error 77 at memory handling function calls. Was there any change since then that i should be aware of, that could affect how indexing dynamic arrays works for ex?
Behaviour suggests to me that there is garbage data in the arrays at the requested indices sometimes. At one occasion while debugging i saw the array contained proper data until a certain value, and above it just noise.
The code parts im trying to run are nbody solvers with BH or other treecode.

“…at memory handling function calls.”

Which memory function calls are you referring to, in particular?

Is this occurrence device side or host side?

cudaMemCpy (ToSymbol) - from dev to host. Occasionally at device sync too.

Already tried 4 different nbody code, and always get the same errors, hence the idea that it could be related to cuda version differences, and not my stupidity.

It is a plausible hypothesis, indeed

The “occasionally” part is unsettling though, as it suggests that the exact point of error is still somewhat unclear

So, are you managing to cudaMemCpy from host to device without error, but when you cudaMemCpy from device to host, you get an arror?
If a memory copy from host to device succeeds without error, then it would refute the version hypothesis, I would think

I might have some issues while copying from host already, and it could escalate to error in the device only after further processing.
Also getting error 73 at memcpy - i can’t even find what that refers to.

BTW Occasionally means i get that in one of the 4 codes, with certain parameters. Not that the same code produces different errors randomly.

Perhaps ring-fence more, to clearly understand why you are receiving errors

“I might have some issues while copying from host already”

Perhaps test this then; put a breakpoint at the host to device memory copy, and step the code around that point; the debugger should flag immediately if an illegal address is involved
If the step executes fine, you can take it that the memory copy from host to device at least works, with the implications that come with that

I was hoping for some well known changes to avoid this :)

anyway i found this:
http://chemaguerra.com/?p=407

at section: 2- Memory alignments.

“…In CUDA 5.0 it is mandatory that said fields start at an offset which is a multiple of 16-byte from the beginning of the structure. You can fix that by re-arranging your fields or by adding some 4-byte ints for padding.”

could it affect declarations like this:

template
class vec3 {
public:
T x;
T y;
T z;
};

sounds like a too huge waste of memory to be true…

I found later versions of CUDA to be less forgiving of index errors.
Earlier I had some code which (from my memory, its along time back)
read outside an array but was ok in the sense of calculating the right
answers. (I think it read the data but it had no impact…) later
versions of CUDA decided this was an error and killed my kernel.
Bill