If, instead of get the Id through the object I use a const value, the execution is immediately, but with the object this assignation takes a long long time.
Is better work with structures that with classes.
Im using this version of cuda
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_May_12_11:09:45_PDT_2011
Cuda compilation tools, release 4.0, V0.2.1221
Calling a method in C++ can involve a cascade of operations, e.g. to find out what actual function is to be called (virtual method?), jumping to that function (storing registers on stack, etc), and finally returning the value.
Accessing a value at a given memory address is a single instruction.
Object orientation can be nice when its features are really required by a program, but do not be fooled to think that they come for free!
One thing to keep in mind is that per-thread overheads get multiplied by the number of threads running when you program in parallel.
The golden standard for parallel algorithms is to have the same work efficiency as sequential algorithms. That is, to perform exactly the same number of operations as sequential algorithms.
A common mistake that programmers who are used to sequential algorithms make is to assume that small constant factors (like function call overheads) don’t matter. If these factors are present in each thread, and you have one thread for each data item (as CUDA programmers are encouraged to do) then they become O(N) factors that can significantly affect performance.