Strange Compilations Problems

I recently put together a computer at home and installed Kubuntu. I followed the instructions and installed the CUDA Toolkit. I was able to compile the samples and run the one called out in the install and all seemed well.

Previously, my CUDA programming experience was limited to Microsoft Visual Studio on Windows. Having switched to Linux at home I began using Nsight Eclipse Edition version 10.0 and I’ve noticed peculiarities in that I cannot seem to use cudadError_t as a return value in template class methods. I couldn’t use cudaEvent_t as an input argument in a class method. If I make a function in a .cu file that returns cudaError_t that seems to be fine. There simply seems to be a problem using these in classes. I’ll provide examples below:

CudaTimer.h

#ifndef CUDA_TIMER_H
	#define CUDA_TIMER_H

	#include <cuda_runtime.h>

	class CudaTimer
	{
		//------------------------------------------------------------------------------------------
		//  Attributes
		//------------------------------------------------------------------------------------------
		private:
			bool m_timeSampled;
			bool m_timerStarted;
			cudaError_t m_lastError;
			cudaEvent_t * m_pStartEvent;
			cudaEvent_t * m_pStopEvent;
			float m_timeInMilliseconds;

		//------------------------------------------------------------------------------------------
		//  Constructor
		//------------------------------------------------------------------------------------------
		public:
			CudaTimer();

		//------------------------------------------------------------------------------------------
		//  Destructor
		//------------------------------------------------------------------------------------------
		public:
			~CudaTimer();

		//------------------------------------------------------------------------------------------
		//  Operations
		//------------------------------------------------------------------------------------------
		private:
			bool CreateEvent(cudaEvent_t ** ppCudaEvent);

		public:
			float GetTime(float & timeInMilliseconds);
			bool StartTimer();
			bool StopTimer();
	};

#endif

CudaTimer.cpp

bool CudaTimer::CreateEvent(cudaEvent_t ** ppCudaEvent)
{
	if (nullptr == ppCudaEvent)
		return false;

	return true;
}

The result is “Member declaration not found” regarding CudaTimer::CreateEvent. As a test, if I change the input argument to void or some C primitive type, there is no error.

CudaAdd.cuh

#ifndef CUDA_ADD_H_
	#define CUDA_ADD_H_

	#include <driver_types.h>

	#include "CudaBuffer.h"

	//----------------------------------------------------------------------------------------------
	//  The following declarations are for the CUDA host wrappers to the device calls.
	//----------------------------------------------------------------------------------------------
	template <typename OPERAND1, typename OPERAND2, typename RESULT>
	bool CPU_AddCast(OPERAND1 * pOperand1,
	                 OPERAND2 * pOperand2,
	                 RESULT * pResult,
	                 int elements,
	                 int threads);
	template <typename OPERAND1, typename OPERAND2, typename RESULT>
	bool CPU_CastAdd(OPERAND1 * pOperand1,
	                 OPERAND2 * pOperand2,
	                 RESULT * pResult,
	                 int elements,
	                 int threads);

	template <typename OPERAND1, typename OPERAND2, typename RESULT>
	class CudaAdd
	{
		//------------------------------------------------------------------------------------------
		//  Operations
		//------------------------------------------------------------------------------------------
		public:
			static cudaError_t AddCast(OPERAND1 * pOperand1,
			                           OPERAND2 * pOperand2,
			                           RESULT * pResult,
			                           size_t elements,
			                           int threads);
			static cudaError_t CastAdd(OPERAND1 * pOperand1,
			                           OPERAND2 * pOperand2,
			                           RESULT * pResult,
			                           size_t elements,
			                           int thrads);
	};

	cudaError_t DoSomething()
	{
		return cudaSuccess;
	}

	//----------------------------------------------------------------------------------------------
	//  Purpose:  This method adds the first n elements (specified by the value of elements) of
	//            pOperand1 and pOperand2 and stores the results into pResult.  Each element of
	//            pOperand1 is added to each element of pOperand2, the result is then casted to type
	//            RESULT and stored into pResult.
	//
	//  Inputs:   pOperand1 - specifies values to use for the first operand of the addition
	//                        operation.
	//            pOperand2 - specifies values to use for the second operand of the addition
	//                        operation.
	//            elements - specifies the first n elements to process in pOperand1, pOperand2 and
	//                       pResult.
	//            threads - specifies the number of threads per block to request when performing the
	//                      addition operation.
	//
	//  Outputs:  pResult - is updated with the results of adding each element of pOperand1 and
	//                      pOperand2.
	//
	//  Returns:  cudaError_t - specifies the success of the operation.
	//                  cudaSuccess - if the operation completed successfully.
	//                  !cudaSuccess - if the operation failed.
	//
	//  Author:   Mr. X
	//
	//  Created:  January 24, 2019
	//----------------------------------------------------------------------------------------------
	template <typename OPERAND1, typename OPERAND2, typename RESULT>
	cudaError_t CudaAdd<OPERAND1, OPERAND2, RESULT>::AddCast(OPERAND1 * pOperand1,
	                                                         OPERAND2 * pOperand2,
	                                                         RESULT * pResult,
	                                                         size_t elements,
	                                                         int threads)
	{
		//------------------------------------------------------------------------------------------
		//  Perform the addition and cast operation on the given buffers.
		//------------------------------------------------------------------------------------------
		CPU_AddCast<OPERAND1, OPERAND2, RESULT>(pOperand1,
		                                        pOperand2,
		                                        pResult,
		                                        elements,
		                                        threads);

		//------------------------------------------------------------------------------------------
		//  Wait for the addition and cast operation to finish.
		//------------------------------------------------------------------------------------------
		cudaDeviceSynchronize();

		//------------------------------------------------------------------------------------------
		//  Check for an error and return the error; will be cudaSuccess of there were no errors.
		//------------------------------------------------------------------------------------------
		return cudaGetLastError();
	}

	//----------------------------------------------------------------------------------------------
	//  Purpose:  This method casts the first n elements (specified by the value of elements) of
	//            pOperand1 and pOperand2 to type RESULT, then adds their values together and stores
	//            the results into pResult.
	//
	//  Inputs:   pOperand1 - specifies values to use for the first operand of the addition
	//                        operation.
	//            pOperand2 - specifies values to use for the second operand of the addition
	//                        operation.
	//            elements - specifies the first n elements to process in pOperand1, pOperand2 and
	//                       pResult.
	//            threads - specifies the number of threads per block to request when performing the
	//                      addition operation.
	//
	//  Outputs:  pResult - is updated with the results of adding each element of pOperand1 and
	//                      pOperand2.
	//
	//  Returns:  cudaError_t - specifies the success of the operation.
	//                  cudaSuccess - if the operation completed successfully.
	//                  !cudaSuccess - if the operation failed.
	//
	//  Author:   Mr. X
	//
	//  Created:  January 24, 2019
	//----------------------------------------------------------------------------------------------
	template <typename OPERAND1, typename OPERAND2, typename RESULT>
	cudaError_t CudaAdd<OPERAND1, OPERAND2, RESULT>::CastAdd(OPERAND1 * pOperand1,
	                                                         OPERAND2 * pOperand2,
	                                                         RESULT * pResult,
	                                                         size_t elements,
	                                                         int threads)
	{
		//------------------------------------------------------------------------------------------
		//  Perform the addition and cast operation on the given buffers.
		//------------------------------------------------------------------------------------------
		CPU_CastAdd<OPERAND1, OPERAND2, RESULT>(pOperand1,
					                            pOperand2,
					                            pResult,
					                            elements,
					                            threads);

		//------------------------------------------------------------------------------------------
		//  Wait for the addition and cast operation to finish.
		//------------------------------------------------------------------------------------------
		cudaDeviceSynchronize();

		//------------------------------------------------------------------------------------------
		//  Check for an error and return the error; will be cudaSuccess of there were no errors.
		//------------------------------------------------------------------------------------------
		return cudaGetLastError();
	}

#endif

This also results in “Member declaration not found.” If I substitude the return of cudaError_t for int, the problem goes away. It isn’t that I can’t use CUDA primitives, it’s simply I can’t seem to declare their use in an h or cuh file and define their use in the accompanying .cpp file.

I’ve tried .h, .cuh and .cpp

It is as though the compiler is only aware of the CUDA primitives in special occasions.

Thanks for any help.

Hi SchnellCoder,

Can you try changing the file types from .h/.cpp to .cuh/.cu types?