Internal debugger error occurred while attempting to launch cublas batched dgemm

I have following application of Cublas batched matrix multiplication function cublasDgemmBatched. The code stopped for “unknown reason”.I would like to know that reason. I’m using VS2017 and CUDA V9.1 with GeForce 940MX.

#include “cuda_runtime.h”
#include “device_launch_parameters.h”
#include"cublas_v2.h"

#include <stdio.h>
#include <stdlib.h>

int kernel(double *A, int N, int batchCt) //N is the dimension of matrix
{
cublasHandle_t handle;
cublasCreate(&handle);
double alpha = 1.0;
double beta = 1.0;
double d_A = NULL; //device copy of matrix data
double pa = NULL; //array of pointers to matrix data
double pb = NULL;
double pc = NULL;
double d_pa = NULL; //device copy of pa
double d_pb = NULL;
double d_pc = NULL;
cudaMalloc((void
)&d_A, 3 * batchCt * N * N * sizeof(double)); //allocate device memory for d_A
cudaMemcpy(d_A, A, 3 * batchCt * N * N * sizeof(double), cudaMemcpyHostToDevice); //memory copy
pa = (double
)malloc(3 * batchCt * sizeof(double
)); //allocate host memory for pa
pb = pa + batchCt;
pc = pa + 2 * batchCt;
for (int k = 0; k < batchCt; k++) //compute the pointers to matrix data on device
{
pa[k] = d_A + k * N * N;
pb[k] = pa[k] + batchCt * N * N ;
pc[k] = pa[k] + 2 * batchCt * N * N;
}
cudaMalloc((void
)&d_pa, 3 * batchCt * sizeof(double
)); //allocate device memory for d_pa
cudaMemcpy(d_pa, pa, 3 * batchCt * sizeof(double
), cudaMemcpyHostToDevice); //memory copy
d_pb = d_pa + batchCt;
d_pc = d_pa + 2 * batchCt;
cublasDgemmBatched(handle, CUBLAS_OP_N, CUBLAS_OP_N, N, N, N, &alpha, (const double
)d_pa, N, (const double
)d_pb, N, &beta, d_pc, N, batchCt);
cudaMemcpy(A, d_A, 3 * batchCt * N * N * sizeof(double), cudaMemcpyDeviceToHost);
cublasDestroy(handle);
cudaFree(d_A);
cudaFree(d_pa);
free¶;
return 0;
}

int main()
{
int batchCt = 3;
int N = 3;
double A = (double)malloc(3 * batchCt * N * N * sizeof(double));
for (int k = 0; k < 3 * batchCt * N * N; k++)
A[k] = k * k / 2 + 1.0; //arbitrary numbers
int result = kernel(A, N, batchCt);
free(A);
return result;
}

The debugging output is “Internal debugger error occurred while attempting to launch maxwell_dgemm_64x64_nn in CUcontext 0x1efc447bb00, CUmodule 0x1efe2482130: code patching failed for unknown reason. All breakpoints for function maxwell_dgemm_64x64_nn have been removed.”

Note: I removed the API error checking functions since they didn’t give me any error information.

The debugger is telling us that it detected a bug inside the debugger. Given that this happens with CUDA 9.1, you would want to file a bug report with NVIDIA (the bug-reporting form is linked from the registered developer website) so the engineering team responsible for the debugger can find the root cause of the failure.