Passing device memory pointer between classes

Hi folks,

I’m trying to create a c++ class signal processing library. I have a class called SignalGPU with a pointer to a device memory location which stores a data array with number values. That pointer (ssignal) is initialized in the SignalGPU constructor:

template<class T> SignalGPU<T>::SignalGPU(T* sgn, int s)
    this->size = s;
    HANDLE_ERROR(cudaMalloc(&this->ssignal,this->size * sizeof (T)));
    HANDLE_ERROR(cudaMemcpy(this->ssignal,sgn,this->size * sizeof (T),cudaMemcpyHostToDevice));

I have also overwritten the + operator:

template<class T> SignalGPU<T>* SignalGPU<T>::operator+(const ISignal<T>& other) const
	T* out;
	HANDLE_ERROR(cudaMalloc(&out,this->size * sizeof (T)));


	SignalGPU<T>* newSignal = SignalGPU<T>::GetDeviceCopy(out,this->size);


	return newSignal;

The code compiles but the program fails with this message:

an illegal memory access was encountered in …/src/DataTypes/ at line200

Al line 200 is the HANDLE_ERROR(cudaFree(out));

I have ran cuda-memecheck and the output is:

========= Invalid global read of size 8
========= at 0x000001c8 in /home/daniel/cuda-work/SAR/Debug/…/src/DataTypes/ CuAddition(double*, double*, int, double*)
========= by thread (671,0,0) in block (10,0,0)
========= Address 0x000154f8 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/ (cuLaunchKernel + 0x331) [0x138251]

I think that the real problem is when passing the other signal pointer to the kernel. It cannot share memory device address between different objects.
I have extensively searched and found the following topic.

But I couldn’t implement successfully this solution. Do you think this is the problem?

Thank you,