Hello everyone,
I’m porting my code to run on gpus using openacc. I’ve been given access to a virtual server using nvidia A40-Q8 graphics card, I’ve installed the hpc sdk 24.1, compiled the openacc examples provided and tried unsuccessfully to run on the server.
The output I’m getting is:
make acc_f1_test
cd acc_f1; make build; make run; make clean
make[1]: Entering directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
nvfortran -fast -Minfo -acc -o acc_f1.out acc_f1.f90
main:
25, Loop not fused: function call before adjacent loop
Generated vector simd code for the loop
28, Generating implicit copyin(a(1:n)) [if not already present]
Generating implicit copyout(r(1:n)) [if not already present]
29, Loop is parallelizable
Generating NVIDIA GPU code
29, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
29, Loop not fused: no successor loop
Generated vector simd code for the loop
33, Generated vector simd code for the loop
38, Loop not vectorized/parallelized: contains call
make[1]: Leaving directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
make[1]: Entering directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
./acc_f1.out
Failing in Thread:1
Accelerator Fatal Error: call to cuCtxCreate returned error 801: Other
File: /data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1/acc_f1.f90
Function: main
Line: 28make[1]: *** [Makefile:28: run] Error 1
make[1]: Leaving directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
make[1]: Entering directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
Cleaning up…
make[1]: Leaving directory ‘/data/jcajas/ejemplosOACC/OpenACC/samples/acc_f1’
The output of nvaccelinfo is:
nvaccelinfo
CUDA Driver Version: 12020
NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.183.01 Sun May 12 19:39:15 UTC 2024Device Number: 0
Device Name: NVIDIA A40-8Q
Device Revision Number: 8.6
Global Memory Size: 8266973184
Number of Multiprocessors: 84
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1740 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 7251 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 6291456 bytes
Max Threads Per SMP: 1536
Async Engines: 2
Unified Addressing: Yes
Managed Memory: No
Preemption Supported: Yes
Cooperative Launch: Yes
Default Target: cc86
And the output of nvidia-smi is
nvidia-smi
Wed Nov 20 15:56:16 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A40-8Q Off | 00000000:06:10.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 0MiB / 8064MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
±--------------------------------------------------------------------------------------+
Does anyone have an idea of what is going on, and how to correct this issue?
Thanks in advance and best regards.
Juan Carlos