CUFFT error handling

I’m using the following macro for CUFFT error handling:

#define cufftSafeCall(err)      __cufftSafeCall(err, __FILE__, __LINE__)
inline void __cufftSafeCall(cufftResult err, const char *file, const int line)
{
    if( CUFFT_SUCCESS != err) {
        fprintf(stderr, "cufftSafeCall() CUFFT error in file <%s>, line %i.\n",
                file, line);
        getch(); exit(-1);
    }
}

This macro does not return the message string from an error code. The book “CUDA Programming: a developer’s guide to parallel computing with GPUs” suggests using the following macro

#define CUDA_CALL(call) { const cudaError_t err = (call); \
if(err != cudaSuccess) \
{ \
    fprintf(stderr, "CUDA error in file '%s', line %d\n %s\nerror %d: %s\nterminating!\n",__FILE__, __LINE__,err, \
                            cudaGetErrorString(err)); \
    cudaDeviceReset(); assert(0); \
} }

(note: it has been somewhat customized without altering the functionalities). The book says: “This technique works for all the CUDA calls except for the invocation of kernels.” However, when using CUDA_CALL on a CUFFT routine call, the compiler returns

a value of type "cufftResult" cannot be used to initialize an entity of type "const cudaError_t".

It seems then that cufftResult and cudaError_t are not immediately compatible.

Investigating a bit more, from this NVIDIA CUDA Library link http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/online/group__CUDART__ERROR_g38e5684c158c22144ad3c269ad61bc78.html, it seems that cudaGetErrorString requires a cudaError_t input type.

My questions are the following:

  1. Is there a way to make cufftResult and cudaError_t be compatible, so that I can use CUDA_CALL on CUFFT routines and receive the message string from an error code?
  2. Is there any technical reason why implementing a different error for the CUFFT library? :-)

Thanks.

JFSebastian,

As you note, the two types are not compatible. Each is just an enum with integral values mapping back to some specific error. For example, cudaError_t == 0x1 corresponds to CUDA_ERROR_INVALID_VALUE whereas cufftResult_t == 0x1 corresponds to CUFFT_INVALID_PLAN. You can see the enums in cuda.h (search for cudaError_enum) and cufft.h.

I am not aware of a helper function that returns a string describing the error for a given cufftResult_t value, but you can easily roll your own based on the names given to the values in cufft.h. It would be helpful if CUFFT, CUBLAS, etc. all had *GetErrorString() equivalents though.

I’m not sure why NVIDIA chose to use separate error types, but it seems to me to be cleaner than the alternative of merging everything into cudaError_t. Firstly, there is no reason to have any references to CUFFT in cuda.h (same goes for CUBLAS, NPP, etc.). Secondly, cudaError_t would become polluted very quickly if all errors were merged there, so separating error types is similar to namespacing them. Finally, removing errors is problematic for backward compatibility, so what happens if libraries are dropped?

tbenson,
thank you very much for your answer.

Following Robert Crovella’s answer at the forum http://stackoverflow.com/questions/16267149/cufft-error-handling, I have updated my macro as

static const char *_cudaGetErrorEnum(cufftResult error)
{
    switch (error)
    {
        case CUFFT_SUCCESS:
            return "CUFFT_SUCCESS";

        case CUFFT_INVALID_PLAN:
            return "CUFFT_INVALID_PLAN";

        case CUFFT_ALLOC_FAILED:
            return "CUFFT_ALLOC_FAILED";

        case CUFFT_INVALID_TYPE:
            return "CUFFT_INVALID_TYPE";

        case CUFFT_INVALID_VALUE:
            return "CUFFT_INVALID_VALUE";

        case CUFFT_INTERNAL_ERROR:
            return "CUFFT_INTERNAL_ERROR";

        case CUFFT_EXEC_FAILED:
            return "CUFFT_EXEC_FAILED";

        case CUFFT_SETUP_FAILED:
            return "CUFFT_SETUP_FAILED";

        case CUFFT_INVALID_SIZE:
            return "CUFFT_INVALID_SIZE";

        case CUFFT_UNALIGNED_DATA:
            return "CUFFT_UNALIGNED_DATA";
    }

    return "<unknown>";
}

inline void __cufftSafeCall(cufftResult err, const char *file, const int line)
{
    if( CUFFT_SUCCESS != err) {
		fprintf(stderr, "CUFFT error in file '%s', line %d\n %s\nerror %d: %s\nterminating!\n",__FILE__, __LINE__,err, \
									_cudaGetErrorEnum(err)); \
		cudaDeviceReset(); assert(0); \
    }
}

to return also the error type string. I would be happy to receive a feedback, if any.

Here’s some feedback. Thanks! That’s awesome.

That’s is amazing. Thank you very much. I made some modification based on your code:

static const char *_cufftGetErrorEnum(cufftResult error)
{
switch (error)
{
case CUFFT_SUCCESS:
return “CUFFT_SUCCESS”;

    case CUFFT_INVALID_PLAN:
        return "The plan parameter is not a valid handle";

    case CUFFT_ALLOC_FAILED:
        return "The allocation of GPU or CPU memory for the plan failed";

    case CUFFT_INVALID_TYPE:
        return "CUFFT_INVALID_TYPE";

    case CUFFT_INVALID_VALUE:
        return "One or more invalid parameters were passed to the API";

    case CUFFT_INTERNAL_ERROR:
        return "An internal driver error was detected";

    case CUFFT_EXEC_FAILED:
        return "cuFFT failed to execute the transform on the GPU";

    case CUFFT_SETUP_FAILED:
        return "The cuFFT library failed to initialize";

    case CUFFT_INVALID_SIZE:
        return "One or more of the parameters is not a supported size";

    case CUFFT_UNALIGNED_DATA:
        return "CUFFT_UNALIGNED_DATA";
        
    case CUFFT_INCOMPLETE_PARAMETER_LIST:    
    		return "Missing parameters in call"
   
   	case CUFFT_INVALID_DEVICE:
   			return "An invalid GPU index was specified in a descriptor or Execution of a plan was on different GPU than plan creation"  
   			
   	case CUFFT_PARSE_ERROR:
   			return "Internal plan database error"		
   	
   	case CUFFT_NO_WORKSPACE:
   			return "No workspace has been provided prior to plan execution"		
   	
   	case CUFFT_NOT_IMPLEMENTED
   			return "Function does not implement functionality for parameters given"		 
   			   
   	case CUFFT_LICENSE_ERROR
   			return "Used in previous versions"	
   			
   	case CUFFT_NOT_SUPPORTED
   			return "Operation is not supported for parameters given"			
}

return "<unknown>";

}

inline void print_cuFFT_error_if_any(cufftResult err, int num) {
if (CUFFT_SUCCESS != err)
{
printf(“\ncuFFT error !!! <%s> !!! \nin file ‘%s’, line %d at CUDA call error code: # %d\n”,_cufftGetErrorEnum(err),FILE, LINE,num);
fflush(stdout);

// outputs error file
FILE* fp;
int myrank;
char filename[BUFSIZ];

#ifdef WITH_MPI
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
#else
myrank = 0;
#endif
sprintf(filename,OUTPUT_FILES"/error_message_%06d.txt",myrank);
fp = fopen(filename,“a+”);
if (fp != NULL){
fprintf(fp,“\ncuFFT error !!! <%s> !!! \nin file ‘%s’, line %d at CUDA call error code: # %d\n”,_cufftGetErrorEnum(err),FILE, LINE,num);
fclose(fp);
}

// stops program

#ifdef WITH_MPI
MPI_Abort(MPI_COMM_WORLD,1);
#endif
exit(EXIT_FAILURE);
}
}