I can compile and execute withou error. Although, i don’t think GPU is being used at all, i dont get parallelization messages and i even ran the profiler of pgi to check and indeed no GPU is being used.
Do you have any idea why?
Also, in that same link check the last comment on comment section, that is me and has more information.
I’m not sure. I just tried one of my codes and GNU was able to compile and run it on my P100, and I was able to get a GPU profile.
I see that you posted a question on Krister’s blog. Hopefully they can help.
If not, please post a full reproducing example as well as the compile and run commands that you’re using. Also, please let me know what kind of NVIDIA device you’re using and which CUDA version is installed.
Here’s my run using SPEC ACCEL’s 303.ostencil benchmark:
% gcc --version
gcc (GCC) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
% sh -x make.out
+ gcc -c -o main.o -DSPEC -DSPEC_ACCEL -DNDEBUG -I./ -I./pbcommon_sources -O3 -fno-fast-math -foffload=-lm -lm -fopenacc main.c
+ gcc -c -o file.o -DSPEC -DSPEC_ACCEL -DNDEBUG -I./ -I./pbcommon_sources -O3 -fno-fast-math -foffload=-lm -lm -fopenacc file.c
+ gcc -c -o kernels.o -DSPEC -DSPEC_ACCEL -DNDEBUG -I./ -I./pbcommon_sources -O3 -fno-fast-math -foffload=-lm -lm -fopenacc kernels.c
+ gcc -c -o pbcommon_sources/parboil.o -DSPEC -DSPEC_ACCEL -DNDEBUG -I./ -I./pbcommon_sources -O3 -fno-fast-math -foffload=-lm -lm -fopenacc pbcommon_sources/parboil.c
+ gcc -O3 -fno-fast-math -foffload=-lm -lm -fopenacc main.o file.o kernels.o pbcommon_sources/parboil.o -lm -o ostencil_exe
% pgprof ./ostencil_exe -o 512x512x98.out -- 512 512 98 200
CPU-based 7 points stencil codes****
Original version by Li-Wen Chang <lchang20@illinois.edu> and I-Jui Sung<sung10@illinois.edu>
This version maintained by Chris Rodrigues ***********
CONSUME ARG: - o
CONSUME ARG: - -
==48574== PGPROF is profiling process 48574, command: ./ostencil_exe -o 512x512x98.out -- 512 512 98 200
IO : 1.146225
Compute : 2.969159
Timer Wall Time: 4.115386
==48574== Profiling application: ./ostencil_exe -o 512x512x98.out -- 512 512 98 200
==48574== Profiling result:
Time(%) Time Calls Avg Min Max Name
94.01% 839.47ms 200 4.1973ms 4.1162ms 4.4123ms cpu_stencil$_omp_fn$0
3.75% 33.443ms 1002 33.375us 831ns 16.419ms [CUDA memcpy HtoD]
2.12% 18.934ms 2 9.4671ms 9.2627ms 9.6714ms [CUDA memcpy DtoH]
0.13% 1.1472ms 200 5.7350us 4.9890us 11.198us [CUDA memcpy HtoH]
And what did you do in order to install gcc with offloading capability? Did you do something similar with those steps on the blog?
My GPU is a NVIDIA GTX 950M.
nvcc --version gives me:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176