Passing device memory pointer between classes

Hi folks,

I’m trying to create a c++ class signal processing library. I have a class called SignalGPU with a pointer to a device memory location which stores a data array with number values. That pointer (ssignal) is initialized in the SignalGPU constructor:

template<class T> SignalGPU<T>::SignalGPU(T* sgn, int s)
{
    this->size = s;
    HANDLE_ERROR(cudaMalloc(&this->ssignal,this->size * sizeof (T)));
    HANDLE_ERROR(cudaMemcpy(this->ssignal,sgn,this->size * sizeof (T),cudaMemcpyHostToDevice));
}

I have also overwritten the + operator:

template<class T> SignalGPU<T>* SignalGPU<T>::operator+(const ISignal<T>& other) const
{
	T* out;
	HANDLE_ERROR(cudaMalloc(&out,this->size * sizeof (T)));

	<b>CuAddition<T><<<977,1024>>>(this->ssignal,other.ssignal,this->size,out);</b>

	SignalGPU<T>* newSignal = SignalGPU<T>::GetDeviceCopy(out,this->size);

	HANDLE_ERROR(cudaFree(out));

	return newSignal;
}

The code compiles but the program fails with this message:

an illegal memory access was encountered in …/src/DataTypes/SignalGPU.cu at line200

Al line 200 is the HANDLE_ERROR(cudaFree(out));

I have ran cuda-memecheck and the output is:

========= Invalid global read of size 8
========= at 0x000001c8 in /home/daniel/cuda-work/SAR/Debug/…/src/DataTypes/SignalGPU.cu:9:void CuAddition(double*, double*, int, double*)
========= by thread (671,0,0) in block (10,0,0)
========= Address 0x000154f8 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x331) [0x138251]

I think that the real problem is when passing the other signal pointer to the kernel. It cannot share memory device address between different objects.
I have extensively searched and found the following topic.

But I couldn’t implement successfully this solution. Do you think this is the problem?

Thank you,

Tomás