Program crashes with wrong ta

Hello,
I have Geforce RTX 2070 and I’m trying to port some code to GPU with OpenACC. I put some acc pragmas into my code and compiled with -ta=tesla:cc75 which seems to be the appropriate one for my GPU.
The compiler invocation looks like follows:
pgc++ -DHAVE_CONFIG_H -I. -Iecho /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src -g -fast -acc -ta=tesla:cc75 -Mprof=ccff -Minfo=accel -DCOINUTILS_BUILD -c -o CoinFactorization3.lo /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src/CoinFactorization3.cpp

And the output seems to be fine:
CoinFactorization::updateColumnUSparsish(CoinIndexedVector *, int ) const:
1386, Generating copyin(indexIn[:numberNonZero]) [if not already present]
Generating copy(stack[:this±>_b_19CoinArrayWithLength.size/4]) [if not already present]
Generating present(next[:],mark[:],list[:])
Generating Tesla code
1396, #pragma acc loop gang, vector(128) /
blockIdx.x threadIdx.x */
1404, Generating implicit reduction(+:nMarked)

However, when I execute, the process crashes with:
Current file: /home/sermus/projects/coin-clp-latest-gpu/CoinUtils/CoinUtils/src/CoinFactorization3.cpp
function: _ZNK17CoinFactorization21updateColumnUSparsishEP17CoinIndexedVectorPi
line: 1386
This file was compiled: -ta=tesla:cc70

I assume it complains the ta=tesla:cc70 doesn’t match my GPU architecture, however, i compiled it with cc75 and not cc70. Can you shed some light what this might be?

My OS is Ubuntu 18.04.

PGI version is

pgcc 19.10-0 LLVM 64-bit target on x86-64 Linux -tp haswell
PGI Compilers and Tools
Copyright © 2019, NVIDIA CORPORATION. All rights reserved.


pgaccelinfo is

CUDA Driver Version: 10010
NVRM version: NVIDIA UNIX x86_64 Kernel Module 435.21 Sun Aug 25 08:17:57 CDT 2019

Device Number: 0
Device Name: GeForce RTX 2070 SUPER
Device Revision Number: 7.5
Global Memory Size: 8366784512
Number of Multiprocessors: 40
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1815 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 7001 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 4194304 bytes
Max Threads Per SMP: 1024
Async Engines: 3
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: Yes
Preemption Supported: Yes
Cooperative Launch: Yes
Multi-Device: Yes
PGI Default Target: -ta=tesla:cc75

CUDA version is 10.1

Hi Sermus,

Unfortunately, I’m not sure what’s wrong here. Adding “-ta=tesla:cc75” should be correct so it’s unclear why cc70 code is getting generated. I tried reproducing the error here on a Tesla T4 (also CC75) but didn’t see any issues. Not that we officially only support Tesla products, but typically other NVIDIA devices will work as well if they use the same CC as a Tesla product.

Are you able provide a reproducing example that I can use to try and recreate the error?

If not, can you add the flag “-v” (verbose) to your compilation and post the output so I can see what device code is being generated?

Also, just checking that you are using PGI to link and using “-ta=tesla:cc75” on the link line? If you’re not using PGI to link, you may need to add “-ta=tesla:cc75,nordc” to your compilation. RDC requires the code to be linked with a device linker which wont be done if you’re using a different compiler.

-Mat

Hello,

I have exactly the same problem here: using GTX 2070, -ta=tesla:cc75 for both compile and link, and failed with the same error message.

I think the problem in my project is in linking. I’ve found a workaround for my project so I may as well write what I’ve done here, even if I still don’t fully understand what the problem was. Plus my project may be completely different from the original post but we have seen the same error message.

I was writing a CMakeLists.txt file for my project, which uses nvcc for a few cuda object files and pgc++ for a few OpenACC object files, no explicit rdc-related flags whatsoever for either compiler (so whatever default they were). For the reason unrelated to this issue (explained at the end of this post), I decided to use “add_custom_target” in cmake to take over linking executable, where I noticed the last two lines in the following cmake script caused the issue:

    pgc++ OBJ_FILES LINK_FLAGS -o a.out
    -L GNU_LIB_PATH -l OTHER_GNU_LIB
    "-L$<JOIN:${CMAKE_CXX_IMPLICIT_LINK_DIRECTORIES},;-L>"
    "-l$<JOIN:${CMAKE_CXX_IMPLICIT_LINK_LIBRARIES},;-l>"

Removing them solved the problem.

stw


P.S.

We had a working Makefile, which simply run the following command

pgc++ main.o <acc objects> <cu objects> -Mcudalib=cufft,cublas <other flags> -o a.out

But cmake didn’t work as nicely by default, as it uses a funky “cmake_device_link.o” mechanism (see reference here https://cmake.org/pipermail/cmake/2017-March/065133.html), which in my understanding, is sort of a library-like object created by nvlink. This mechanism either used some rdc-related flags or some static/dynamic library flags so that it became too advanced for a user like me to resolve all of the “missing reference to __fatbin… sysmbol” errors.

Thanks stw. I’ll try to file that away in my brain in case a similar situation arises. Though the error is somewhat generic and don’t think the original post had the same root cause (unless they failed to mention that they were using CMake).

-Mat