Developing with CUDA fortran on Cluster environment.

aketh · November 21, 2017, 1:19am

I recently noticed using CUDA fortran on a cluster environment is very hard.

Since the compiler makes a lot of assumption about the target machine the code when when submitted as a batch job with the following error.

/opt/slurm/data/slurmd/job3063367/slurm_script: line 11: 16930 Illegal instruction (core dumped) ./a.out

This makes it very difficult to develop.

How do I know what compiler options to use at compile time to be able to get the program running successfully with the knowledge of the GPU??

Here is the information about GPU i get using interactive job.

CUDA Driver Version: 9000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.81 Sat Sep 2 02:43:11 PDT 2017

Device Number: 0
Device Name: Tesla K80
Device Revision Number: 3.7
Global Memory Size: 11995578368
Number of Multiprocessors: 13
Number of SP Cores: 2496
Number of DP Cores: 832
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 823 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: exclusive-process
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 2505 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 1572864 bytes
Max Threads Per SMP: 2048
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
PGI Compiler Option: -ta=tesla:cc35

MatColgrove · November 21, 2017, 2:44am

Hi Aketh,

An illegal instruction error is actually on the host side. You must have different CPUs on each node of the cluster.

For this, use “-tp=px” to target a generic x86 CPU or use the lowest common processor for your cluster. To see a list of processors, use the command “pgfortran -help -tp”. You can also see which processors you have by running “pgcpuid” on each of the types of nodes.

Also for the GPU side, you can include a list of target devices to the “-Mcuda” options. I believe you were using “-Mcuda=Kepler+” which will include all Kepler devices as well as Maxwell. If you wanted to include Pascal as well, you could use “-Mcuda=cc35,cc50,c60”.

Hope this helps,
Mat