Overloading between device and host?

ademarl · October 10, 2013, 5:29pm

Hi, I’d like to overload the c++ operators to work both on devices and on host with different implementations each. Is this possible? As an example, I am trying to use polimorphism to run the operator+ in the device with a gpu implementation and on the host, by calling a kernel that uses this operator.

// This function runs on the device
template<class T> inline __device__
interval_gpu<T> operator+(interval_gpu<T> const &x, interval_gpu<T> const &y)
{
    rounded_arith<T> rnd;
    return interval_gpu<T>(rnd.add_down(x.lower(), y.lower()),
                           rnd.add_up(x.upper(), y.upper()));
}

// kernel to call in the host implementation of the operator
template <class T>
__global__ void add(T a, T b, T *c){
	*c = a + b;
}

template<class T> inline __host__
interval_gpu<T> operator+(interval_gpu<T> const &x, interval_gpu<T> const &y)
{
	interval_gpu<T> c;
	interval_gpu<T> *d_c;
	cudaMalloc((void**)&d_c, sizeof(interval_gpu<T>));
	add<<<1,1>>>(x, y, d_c);
	cudaDeviceSynchronize();
	cudaMemcpy(&c, d_c, sizeof(interval_gpu<T>), cudaMemcpyDeviceToHost);
	cudaFree(d_c);
	return c;
}

So far I get errors for redeclaring the operator+, is there a workaround?

ademarl · October 10, 2013, 5:38pm

I’d be ok with a solution such as:

template<class T> inline __host__ __device__
interval_gpu<T> operator+(interval_gpu<T> const &x, interval_gpu<T> const &y)
{
  //if on gpu execute A (device)
  //else execute B (host)
  return result
}

But can I find out at runtime if I am on the device or on the host? How would I do that?

ademarl · October 10, 2013, 6:00pm

I just tried the following code, with no idea if it would work, but get the message I cant call a Kernel from a function declared on the device in my architecture (2.1), this would need architecture 3.5 or above (compute capability)… Anyone can think of another way i could use on my 2.1?

// Binary operators
template <class T>
__global__ void add(T a, T b, T *c){
	*c = a + b;
}

template<class T> inline __device__ __host__
interval_gpu<T> operator+(interval_gpu<T> const &x, interval_gpu<T> const &y)
{
		int i = threadIdx.x;
		cudaError_t error = cudaGetLastError();
		if(error == cudaSuccess){
    	rounded_arith<T> rnd;
    	return interval_gpu<T>(rnd.add_down(x.lower(), y.lower()),
                           rnd.add_up(x.upper(), y.upper()));
		}
		else{
			interval_gpu<T> c;
			interval_gpu<T> *d_c;
			cudaMalloc((void**)&d_c, sizeof(interval_gpu<T>));
			add<<<1,1>>>(x, y, d_c);
			cudaDeviceSynchronize();
			cudaMemcpy(&c, d_c, sizeof(interval_gpu<T>), cudaMemcpyDeviceToHost);
			cudaFree(d_c);
			return c;
		}
}

Topic		Replies	Views
define whether compiled for host or device CUDA Programming and Performance	4	4282	August 19, 2009
How to declare a __host__ fmul_rn? __device__ and __host__ defines? CUDA Programming and Performance	9	2548	October 20, 2009
Overloading __host__ and __device__ function CUDA Programming and Performance	2	3928	May 30, 2013
How to call Host functions on device CUDA Programming and Performance	3	1423	December 11, 2009
Calling a host function from within the device CUDA Programming and Performance	4	2427	April 13, 2016
Using a function both in cpp & device code CUDA Programming and Performance	5	2346	May 23, 2010
sharing function between Host and Device CUDA Programming and Performance	3	2850	September 24, 2009
Compile device operator from other class CUDA and C++ CUDA Programming and Performance cuda	1	642	May 27, 2021
How to call device function in host function. CUDA Programming and Performance	2	2198	July 22, 2019
__host__ and __device__ qualifies CUDA Programming and Performance	1	4459	February 13, 2010

Overloading between device and host?

Related topics