how to compil CUDA device functions

Hi all,
OpenACC does not offer access to GPU-specific features useful for debugging, optimization and other purposes.
For debugging I want to call device functions from within OpenACC kernels.
I do this
print.cu

#include <cstdio>
 // Return thread 3D index.
extern "C" __device__ int3 acc_get_thread_idx()
{
    int3 result;
    result.x = threadIdx.x;
    result.y = threadIdx.y;
    result.z = threadIdx.z;
     
    return result;
}
 
// Return block 3D index.
extern "C" __device__ int3 acc_get_block_idx()
{
    int3 result;
    result.x = blockIdx.x;
    result.y = blockIdx.y;
    result.z = blockIdx.z;
     
    return result;
}
// Print values from within the OpenACC parallel for loop.
extern "C" __device__ void print(int3 thread, int3 block, int i)
{
    printf("block: (%d, %d, %d), thread: (%d, %d, %d) :: i = %d \n", block.x, block.y, block.z, thread.x, thread.y, thread.z, i);
}

main.c

// Declaration of 3-integer structure, which is built-in
// in CUDA, but not in C/OpenACC.
typedef struct { int x, y, z; } int3;
 
#pragma acc routine
int3 acc_get_thread_idx();
 
#pragma acc routine
int3 acc_get_block_idx();

#pragma acc routine
void print(int3 thread, int3 block, int i);

void main()
{
    #pragma acc parallel loop
    for (int i = 0; i < 512; i++)
    {
       int3 thread = acc_get_thread_idx();
       int3 block = acc_get_block_idx();

       // Print values from within the OpenACC parallel for loop.
       print(thread, block, i);
    }
}

How do i compile with pggc main.cu?
Should nvcc be used?

Rem : I use PGI under Windows

Hi LeMoussel,

Yes, you need to use nvcc to compile CUDA code. PGI and nvcc objects are link compatible.

Though for these operations, PGI does provide extensions to OpenACC which allow you to query the block and thread idx.

From “openacc.h”

extern int __pgi_gangidx(void);
extern int __pgi_workeridx(void);
extern int __pgi_vectoridx(void);
extern int __pgi_blockidx(int);
extern int __pgi_threadidx(int);

Hope this helps,
Mat

Hi Mat,
with this Makefile

CC = pgcc
CFLAGS = -g -O3 -Minfo=all -ta=nvidia:cc30 -fast
NVCC = nvcc
NVCCFLAGS = -rdc=true -arch=sm_30

all: main
 
main: main.o print.o
	$(CC) $(CFLAGS) $^ -o $@ 

main.o: main.c
	$(CC) $(CFLAGS) -c $< -o $@

print.o: print.cu
	$(NVCC) $(NVCCFLAGS) -c $< -o $@

clean:
	rm -rf main *.o

I got error bin2c is unknow. => There is no bin2c.exe in C:\Program Files\PGICE\win64\2018\cuda\9.1\bin
Strange, … in CUDA Toolkit 9.1, there is bin2c.exe in bin directory.

if I copy CUDA Toolkit 9.1 bin2c.exe, I got error cudafe++ is unknow. Itou, I copy CUDA Toolkit 9.1 cudafe++.exe,
But I got this error :

pgcc -g -O3 -Minfo=all -ta=nvidia:cc30 -fast  -c main.c -o main.o
nvcc.exe  -rdc=true -arch=sm_30 -c print.cu -o print.o
print.cu
pgcc -g -O3 -Minfo=all -ta=nvidia:cc30 -fast  main.o print.o -o main 
nvlink fatal   : Input file 'print.o' newer than toolkit
child process exit with signal 2: C:\PROGRA~1\PGICE\win64\18.4\bin\pgnvd.exe

Hi LeMoussel,

A couple of issues here.

We only ship the CUDA components we need for compilation of CUDA Fortran and OpenACC. Instead, please use a CUDA SDK installation when compiling the CUDA code.

Second, you need to match the CUDA version between the two compilations. Depending on the PGI version you’re using, we default to different CUDA versions with PGI CUDA 18.4 defaulting to CUDA 8.0. You should set the CUDA version you wish PGI to use as part of the compiler flags if not the default. For example: “-ta=tesla:cc30,cuda9.1”. Alternately with the PGI 2018 compilers, you can set the environment variable “CUDA_HOME” to the location of your CUDA SDK. See: https://www.pgroup.com/resources/docs/18.7/x86/pgi-release-notes/index.htm#cuda-toolkit-versions

Next, add the “-Mcuda” flag to your PGI compilation flags. This will indicate that you are going to link your program with CUDA build objects.

Finally, add “-rdc=true” to your nvcc compilation flags.

-Mat

% nvcc -c -O2 print.cu -rdc=true
% pgcc -ta=tesla:cc70,cuda9.1 -Mcuda main.c print.o -Minfo=accel
main.c:
main:
     17, Accelerator kernel generated
         Generating Tesla code
         18, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
     17, Local memory used for thread,block
% a.out
block: (1, 0, 0), thread: (0, 0, 0) :: i = 128
block: (1, 0, 0), thread: (1, 0, 0) :: i = 129
block: (1, 0, 0), thread: (2, 0, 0) :: i = 130
block: (1, 0, 0), thread: (3, 0, 0) :: i = 131
block: (1, 0, 0), thread: (4, 0, 0) :: i = 132
....

Hi Mat,

  1. CUDA SDK installation (CUDA Toolkit 9.1 Download - Archived: https://developer.nvidia.com/cuda-91-download-archive) & compiling the CUDA code. OK
  2. PGI compilation KO. I get the following message:
$ pgcc -o main -O2 -fast -acc -ta=tesla:cuda9.1 -Minfo=all -Mcuda main.o print.o
LINK : fatal error LNK1104: cannot open file 'libcudapgi.lib'
./main.exf: error STP001: cannot open file

$ pgcc --version

pgcc 18.4-0 64-bit target on x86-64 Windows -tp shanghai
PGI Compilers and Tools
Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.

Rem: ‘libcudapgi.lib’ is not present in the PGICE directory.

Hi LeMoussel,

Apologies for that. Our manufacturing folks made an error in 18.4 that accidentally left libcudapgi.lib out of the final installation packages. We fixed this in 18.5.

Please send a note to PGI Customer Service (support@pgroup.com) and we can send you the file which you can include with 18.4.

-Mat

Email sent to PGI customer service today

Hi Mat, receive libcudapgi.lib .
I copied it into C:\Program Files\PGICE\win64\18.4\lib

No more compilation errors. Now I have this error when executing

$ main
call to cudaGetSymbolAddress returned error 13: Other

Compiling with : pgcc 18.4-0 64-bit target on x86-64 Windows -tp shanghai

I believe that this is a mismatch between the CUDA versions used to build the OpenACC and CUDA. What version of CUDA are you using and did you set the compile time options to match OpenACC? example: “-ta=telsa:cuda9.2” or set the environment variable “CUDA_HOME=<path/to/cuda/dir>”.

-Mat

I found the error
I do :

$ pgcc -ta=tesla:cc70,cuda9.1 -Mcuda main.c print.obj -Minfo=accel

cc70 causes the error. You have to do

pgcc -ta=tesla:cuda9.1 -Mcuda main.c print.obj -Minfo=accel

Mat!, thanks for your help.

Aah, yes, cc70 is what I’m using since I have a V100. You need to change this to the match the device you have, or remove it if your device is one of the default targets.

-Mat