Hello,
I have Geforce RTX 2070 and I’m trying to port some code to GPU with OpenACC. I put some acc pragmas into my code and compiled with -ta=tesla:cc75 which seems to be the appropriate one for my GPU.
The compiler invocation looks like follows:
pgc++ -DHAVE_CONFIG_H -I. -Iecho /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src
-g -fast -acc -ta=tesla:cc75 -Mprof=ccff -Minfo=accel -DCOINUTILS_BUILD -c -o CoinFactorization3.lo /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src/CoinFactorization3.cpp
And the output seems to be fine:
CoinFactorization::updateColumnUSparsish(CoinIndexedVector *, int ) const:
1386, Generating copyin(indexIn[:numberNonZero]) [if not already present]
Generating copy(stack[:this±>_b_19CoinArrayWithLength.size/4]) [if not already present]
Generating present(next[:],mark[:],list[:])
Generating Tesla code
1396, #pragma acc loop gang, vector(128) / blockIdx.x threadIdx.x */
1404, Generating implicit reduction(+:nMarked)
However, when I execute, the process crashes with:
Current file: /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src/CoinFactorization3.cpp
function: _ZNK17CoinFactorization21updateColumnUSparsishEP17CoinIndexedVectorPi
line: 1386
This file was compiled: -ta=tesla:cc70
I assume it complains the ta=tesla:cc70 doesn’t match my GPU architecture, however, i compiled it with cc75 and not cc70. Can you shed some light what this might be?
My OS is Ubuntu 18.04.
PGI version is
pgcc 19.10-0 LLVM 64-bit target on x86-64 Linux -tp haswell
PGI Compilers and Tools
Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
pgaccelinfo is
CUDA Driver Version: 10010
NVRM version: NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019
Device Number: 0
Device Name: GeForce RTX 2070 SUPER
Device Revision Number: 7.5
Global Memory Size: 8366784512
Number of Multiprocessors: 40
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1815 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 7001 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 4194304 bytes
Max Threads Per SMP: 1024
Async Engines: 3
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: Yes
Preemption Supported: Yes
Cooperative Launch: Yes
Multi-Device: Yes
PGI Default Target: -ta=tesla:cc75
CUDA version is 10.1