Unable to create context on nvidia A40: Accelerator Fatal Error: call to cuCtxCreate returned error 801: Other

Hello everyone,

I’m porting my code to run on gpus using openacc. I’ve been given access to a virtual server using nvidia A40-Q8 graphics card, I’ve installed the hpc sdk 24.1, compiled the openacc examples provided and tried unsuccessfully to run on the server.

The output I’m getting is:

make acc_f1_test
cd acc_f1; make build; make run; make clean
make[1]: Entering directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
nvfortran -fast -Minfo -acc -o acc_f1.out acc_f1.f90
main:
25, Loop not fused: function call before adjacent loop
Generated vector simd code for the loop
28, Generating implicit copyin(a(1:n)) [if not already present]
Generating implicit copyout(r(1:n)) [if not already present]
29, Loop is parallelizable
Generating NVIDIA GPU code
29, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
29, Loop not fused: no successor loop
Generated vector simd code for the loop
33, Generated vector simd code for the loop
38, Loop not vectorized/parallelized: contains call
make[1]: Leaving directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
make[1]: Entering directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
./acc_f1.out
Failing in Thread:1
Accelerator Fatal Error: call to cuCtxCreate returned error 801: Other
File: /data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1/acc_f1.f90
Function: main
Line: 28

make[1]: *** [Makefile:28: run] Error 1
make[1]: Leaving directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
make[1]: Entering directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
Cleaning up…
make[1]: Leaving directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’

The output of nvaccelinfo is:

nvaccelinfo

CUDA Driver Version: 12020
NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.183.01 Sun May 12 19:39:15 UTC 2024

Device Number: 0
Device Name: NVIDIA A40-8Q
Device Revision Number: 8.6
Global Memory Size: 8266973184
Number of Multiprocessors: 84
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1740 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 7251 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 6291456 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Managed Memory: No
Preemption Supported: Yes
Cooperative Launch: Yes
Default Target: cc86

And the output of nvidia-smi is

nvidia-smi
Wed Nov 20 15:56:16 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A40-8Q Off | 00000000:06:10.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 0MiB / 8064MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+

Does anyone have an idea of what is going on, and how to correct this issue?

Thanks in advance and best regards.

Juan Carlos

Hi Jaun Carlos,

Looks like the runtime can’t create a CUDA Context. My best guess is that there’s some issue with the vGPU setup rather than with the program, but I’m not an expert with these so not sure how to diagnose.

Are you able to successfully run a simple CUDA C example program?

If not, then we can move your post over the virtual GPU forums for help.

-Mat

Hello Mat,

thanks for your quick answer, I’ve run the following simple CUDA C example unsuccessfully in the virtual server.

include <stdio.h>
include <stdlib.h>
include <cuda.h>

// — global variables ----------------------------------------------------
CUdevice device;
CUcontext context;
CUresult err = cuInit(0);
size_t totalGlobalMem;

// — functions -----------------------------------------------------------
void initCUDA()
{
int deviceCount = 0;

printf("- Initializing...\n");

if (err == CUDA_SUCCESS)
   cuDeviceGetCount(&deviceCount);

if (deviceCount == 0) {
    fprintf(stderr, "Error: no devices supporting CUDA\n");
    exit(-1);
}

// get CUDA device
cuDeviceGet(&device, 0);
char name[100];
cuDeviceGetName(name, 100, device);
printf("> Using device 0: %s\n", name);

// print some info
cuDeviceTotalMem(&totalGlobalMem, device);
printf("  Total amount of global memory:   %llu bytes\n",
       (unsigned long long)totalGlobalMem);

// try to create context
err = cuCtxCreate(&context, 0, device);
if (err != CUDA_SUCCESS) {
    fprintf(stderr, "* Error initializing the CUDA context.\n");
    cuCtxDestroy(context);
    exit(-1);
}
else {
  fprintf(stderr, "* CUDA context created successfully.\n");
}

}
void finalizeCUDA()
{
err = cuCtxDestroy(context);
if (err != CUDA_SUCCESS) {
fprintf(stderr, “* Error destroying the CUDA context.\n”);
exit(-1);
}
else {
fprintf(stderr, “* CUDA context destroyed successfully.\n”);
}
printf(“- Finalizing…\n”);
}

int main(int argc, char **argv)
{
initCUDA();
finalizeCUDA();
}

I compiled the code with nvcc context.cpp -ccbin nvc++ -Xcompiler -cudalib
and ran with the output:

  • Initializing…

Using device 0: NVIDIA A40-8Q
Total amount of global memory: 8266973184 bytes

  • Error initializing the CUDA context.

I also tested successfully this code in my local work station with the output:

  • Initializing…

Using device 0: NVIDIA GeForce GTX 1050 Ti
Total amount of global memory: 4231790592 bytes

  • CUDA context created successfully.
  • CUDA context destroyed successfully.
  • Finalizing…

Probably it is a great idea to move the post to the virtual GPU forums. Could you help me doing so please ?

Best regards JC.

Hi JC,

I moved your post over to the Virtual GPU General Discussion forum. Now I’m not sure this is the correct spot, but hopefully someone here might have ideas on how to get your vGPU to create a context.

Note that if a different Virtual GPU forum would work better, edit your original post and you’ll see a drop down menu where you can select a different forum.

-Mat

2 Likes

Hello, I also have the same problem with a virtual GPU. I get the following error:
Failing in Thread:1
Accelerator Fatal Error: call to cuCtxCreate returned error 801: Other
File: /data/eperez/Workbench/Investigacion/OpenACC/calor2DAcc/ecCalor2D_tdma_parallel.f90
Function: calor2d:1
Line: 68
I have no issues compiling my program, only when running it.
Best regards.