Orders of parameters in kernel appears as undocumented requirement.

A small oddity I have not found an explanation to anywhere in the manuals or other C++ references.

I tried the sample provided immediately given when creating a new project in CUDA 8.0.
This worked fine.

I added a variable to host memory copied and pasted with a modification the source given in all required spaces and added it to the addWithCuda sample existing. It now looks like this.

Note that a and b are constant vectors, c is the original example result vector and x is my goofing-around vector with one element.

cudaError_t addWithCuda(int *a, int *b, int *c, unsigned int size, int *x);

Result: Nothing is returned in x, incrementing x on device has no effect.

Modifying it further;

cudaError_t addWithCuda(int *x, int *a, int *b, int *c, unsigned int size);

Result: x is returned properly through cudaMemcpy as expected.
Incrementing x on device still has no effect except in the final stages of
the last two elements in the called kernel code. Normally the last few threads.

Is there some specific limitations or preference to;

A) The order of passing in parameters?
B) Which kernel may work on global or shared memory and not?
(Other than blocks access to all subsequent threads shared memory in one SM)

I could do this as x& however if *x works in some cases but not all, this seems a bit obscure to me.
After all the kernel does not rewrite it self in between the tested 15 threads working with it.

there are no limitations of the kind you have imagined.

You have made a mistake elsewhere.

This isn’t a kernel call that you have shown anyway, so imagining that this is identifying the order of kernel parameters is a mistake.

No I have of course adjusted the kernels input parameters accordingly.

The first one failed, second one worked flawlessly.

I tried both versions and I did no other change but to change the order of the params in both kernel and the above spec as extended on from the provided sample source from the installation of CUDA 8.0.

It was very odd to say the least.

A final question remains in my fairly steep learning curve.
But that is a topic for another thread.

Nobody can show you what you did wrong with only one line of code to look at. If you want help, my suggestion is to provide a short, complete example, that someone else could run and see the issue, without having to add anything or change anything. Any other sort of claim made here just makes it harder for others to help you.

If the order of kernel parameters actually mattered the way you describe, it would be a bug in CUDA, not anything that is by design. CUDA generally attempts to adhere to published C++ standards, and C++ has no such limitation on order of function parameters.

Not to worry txbob. Thanks for the input.

I cannot move the code for an example from the machine in question as it is 100% off line and I restricted access to it for purposes of a self imposed “PCP” (Paranoid Computing Policy).

It is of little consequence to the further studies of CUDA.