__global__ function call is not configured

Hello!
Recently I met a problem with cuda (cuda-9.0).
My gpu is TITAN XP, and the driver is 387.84 for Linux. The OS is Centos 7.0.

I met the problem:
global function call is not configured" retured from ‘cublasCreate(&handle_)’

And the detail of the code is:

#include <unistd.h>
void CuDevice::FinalizeActiveGpu() {
// The device at this point should have active GPU, so we can query its name
// and memory stats and notify user which GPU is finally used.

// Get the device-id of active device:
{
int32 act_gpu_id;
int32 count;
cudaError_t e;
while (count < 3){
e = cudaGetDevice(&act_gpu_id);
KALDI_LOG << “while*******” << act_gpu_id;
if (e != cudaSuccess) {
sleep(10);
count++;
continue;
}else break;
}
if (count == 3) {
KALDI_CUDA_ERR(e, “Failed to get device-id of active device.”);
}
// Remember the id of active GPU
active_gpu_id_ = act_gpu_id; // CuDevice::Enabled() is true from now on
KALDI_LOG << “*******” << act_gpu_id;
// Initialize the CUBLAS
CU_SAFE_CALL(cublasCreate(&handle_));

// Notify user which GPU is finally used
char name[128];
DeviceGetName(name,128,act_gpu_id);

CU_SAFE_CALL(cudaGetDeviceProperties(&properties_, act_gpu_id));

KALDI_LOG << "The active GPU is [" << act_gpu_id << "]: " << name << "\t"
          << GetFreeMemory(&free_memory_at_startup_, NULL) << " version "
          << properties_.major << "." << properties_.minor;

}
return;
}

Could you please offer some help?

You don’t include cublas_v2.h ?

What is the compile command line?
Provide the complete file you are trying to compile. This clearly isn’t it.

Of course we have compiled all the files successfully.
And we ran the jobs several iterations,but suddently abort due to this error.

The erros occurs randomly.

That shouldn’t happen. “global function call is not configured” normally means that no valid grid and block information is passed to a kernel launch, e.g. calling a global function without the launch configuration parameters in angular brackets.

This might be a follow-on issue to a previous failure that wasn’t caught. Review status checking for every CUDA and CUBLAS API call. Try running under control of cuda-memcheck to see whether it reports any issues. Check for memory corruption on the host with valgrind.