Hi,
Compiling appears successful with pgcc 16.3. However, I am not able to run on the nvidia GPU.
The following configure:
$HOME/configure \
...... \
--disable-static \
--enable-shared \
--host=x86_64-unknown-linux-gnu \
LDFLAGS="-L/opt/pgi/linux86-64/16.3/lib -L/usr/lib64" \
LIBS="-lcuda -lm -lpthread -lrt -lpgc -lnspgc -lnuma -lpgmp -lpgftnrtl -lz -ldl -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902" \
CC=/opt/pgi/linux86-64/16.3/bin/pgcc \
CFLAGS="-c99 -fast -acc -ta=tesla:nordc -shared -Minfo=accel -fPIC -noswitcherror" \
FC=/opt/pgi/linux86-64/16.3/bin/pgf95 \
FCFLAGS="-fast -Mpreprocess -noswitcherror" \
CXX=/opt/pgi/linux86-64/16.3/bin/pgc++ \
CXXFLAGS="-noswitcherror -acc -ta=tesla:nordc -shared -Minfo=accel"
Get the following error:
Current file:
$HOME/cs_matrix.c
function: _mat_vec_p_l_csr
line: 2237
Current region was compiled for:
NVIDIA Tesla GPU sm30 sm35 sm30 sm35 sm50
Available accelerators:
device[1]: Native X86 (CURRENT DEVICE)
The accelerator does not match the profile for which this program was compiled
solver script exited with status 1.
Error running the calculation.
Running ldd comand as a regular user:.
[jackie@localhost $HOME]$ ldd -v cs_solver
linux-vdso.so.1 => (0x00007fffcbf46000)
libsaturne.so.0 => /home/huchuanwei/Desktop/saturne_build2.4/prod/arch/lib/libsaturne.so.0 (0x00007f5045f69000)
libple.so.1 => /home/huchuanwei/Desktop/saturne_build2.4/prod/arch/lib/libple.so.1 (0x00007f5045d62000)
libmedC.so.1 => /opt/med-3.0.8//lib/libmedC.so.1 (0x00007f5045a53000)
libhdf5.so.9 => /opt/hdf5/lib/libhdf5.so.9 (0x00007f504557d000)
libxml2.so.2 => /opt/libxml2//lib/libxml2.so.2 (0x00007f5045223000)
libblas.so.0 => /opt/pgi/linux86-64/16.3/lib/libblas.so.0 (0x00007f50440cd000)
libz.so.1 => /lib64/libz.so.1 (0x0000003fd1e00000)
libcuda.so.1 => /usr/lib64/nvidia/libcuda.so.1 (0x00007f50436c4000)
librt.so.1 => /lib64/librt.so.1 (0x0000003fd2200000)
libpgc.so => /opt/pgi/linux86-64/16.3/lib/libpgc.so (0x00007f504343b000)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003fd3a00000)
libpgmp.so => /opt/pgi/linux86-64/16.3/lib/libpgmp.so (0x00007f50431bb000)
libpgftnrtl.so => /opt/pgi/linux86-64/16.3/lib/libpgftnrtl.so (0x00007f5042f85000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003fd1600000)
libpgf90rtl.so => /opt/pgi/linux86-64/16.3/lib/libpgf90rtl.so (0x00007f5042d5f000)
libpgf90.so => /opt/pgi/linux86-64/16.3/lib/libpgf90.so (0x00007f50427b4000)
libpgf90_rpm1.so => /opt/pgi/linux86-64/16.3/lib/libpgf90_rpm1.so (0x00007f50425b2000)
libpgf902.so => /opt/pgi/linux86-64/16.3/lib/libpgf902.so (0x00007f504239f000)
libcudadevice.so => /opt/pgi/linux86-64/16.3/lib/libcudadevice.so (0x00007f504218e000)
libaccapi.so => /opt/pgi/linux86-64/16.3/lib/libaccapi.so (0x00007f5041f71000)
libaccg.so => /opt/pgi/linux86-64/16.3/lib/libaccg.so (0x00007f5041d55000)
libaccn.so => /opt/pgi/linux86-64/16.3/lib/libaccn.so (0x00007f5041b31000)
libaccg2.so => /opt/pgi/linux86-64/16.3/lib/libaccg2.so (0x00007f5041924000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003fd1a00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003fd1200000)
libc.so.6 => /lib64/libc.so.6 (0x0000003fd0e00000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003cd5400000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003cd5000000)
libnvidia-fatbinaryloader.so.367.48 => /usr/lib64/nvidia/libnvidia-fatbinaryloader.so.367.48 (0x00007f50416d3000)
/lib64/ld-linux-x86-64.so.2 (0x0000003fd0a00000)
best,
Jackie
Hi Jackie,
What’s the difference between a “regular user” and the case where it doesn’t work?
My best guess is the erroneous case doesn’t have execute permissions on the CUDA driver runtime library, “libcuda.so”.
Hi, Mat, Thank you so much for your quick reply.
what I forgot to say is that C codes add some OpenACC directives for GPU, another codes has nothing OpenACC directives.
I think regular user have execute permissions “libcuda.so”, and I can use OpenACC directly in regular user for simply codes.
When I run the execute file(./cs_solver which use python ), I will get the error. The execute file information library is following:
Version information:
./cs_solver:
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
/home/jackie/Desktop/saturne_build2.4/prod/arch/lib/libsaturne.so.0:
libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
librt.so.1 (GLIBC_2.2.5) => /lib64/librt.so.1
libxml2.so.2 (LIBXML2_2.4.30) => /opt/libxml2//lib/libxml2.so.2
libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/home/jackie/Desktop/saturne_build2.4/prod/arch/lib/libple.so.1:
libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/med-3.0.8//lib/libmedC.so.1:
libgcc_s.so.1 (GCC_3.0) => /lib64/libgcc_s.so.1
libstdc++.so.6 (GLIBCXX_3.4.9) => /usr/lib64/libstdc++.so.6
libstdc++.so.6 (CXXABI_1.3) => /usr/lib64/libstdc++.so.6
libstdc++.so.6 (GLIBCXX_3.4) => /usr/lib64/libstdc++.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/hdf5/lib/libhdf5.so.9:
libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/libxml2//lib/libxml2.so.2:
libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
libz.so.1 (ZLIB_1.2.2.3) => /lib64/libz.so.1
libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
/lib64/libz.so.1:
libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/usr/lib64/nvidia/libcuda.so.1:
libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
librt.so.1 (GLIBC_2.2.5) => /lib64/librt.so.1
libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/lib64/librt.so.1:
libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
libpthread.so.0 (GLIBC_PRIVATE) => /lib64/libpthread.so.0
libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libpgc.so:
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/usr/lib64/libnuma.so.1:
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libpgmp.so:
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libpgftnrtl.so:
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/lib64/libdl.so.2:
ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libpgf90rtl.so:
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libpgf90.so:
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libpgf90_rpm1.so:
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libpgf902.so:
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libcudadevice.so:
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libaccapi.so:
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
/opt/pgi/linux86-64/16.3/lib/libaccg.so:
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libaccn.so:
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/opt/pgi/linux86-64/16.3/lib/libaccg2.so:
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/lib64/libpthread.so.0:
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/lib64/libm.so.6:
libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/lib64/libc.so.6:
ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
/usr/lib64/libstdc++.so.6:
libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
libgcc_s.so.1 (GCC_4.2.0) => /lib64/libgcc_s.so.1
libgcc_s.so.1 (GCC_3.3) => /lib64/libgcc_s.so.1
libgcc_s.so.1 (GCC_3.0) => /lib64/libgcc_s.so.1
libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/lib64/libgcc_s.so.1:
libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
/usr/lib64/nvidia/libnvidia-fatbinaryloader.so.367.48:
libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
same like indirectly using libcuda.so
[jackie@localhost lib64]$ ls -al /usr/lib64/nvidia/libcuda.so*
lrwxrwxrwx. 1 root root 17 Oct 13 09:06 /usr/lib64/nvidia/libcuda.so.1 -> libcuda.so.367.48
-rwxr-xr-x. 1 root root 8222824 Sep 4 08:52 /usr/lib64/nvidia/libcuda.so.367.48
[jackie@localhost lib64]$ ls -al /lib64/libm*
-rwxr-xr-x. 1 root root 599392 Jan 28 2015 /lib64/libm-2.12.so
lrwxrwxrwx. 1 root root 12 Mar 30 2015 /lib64/libm.so.6 -> libm-2.12.so
[jackie@localhost lib64]$ ls -al /lib64/librt*
-rwxr-xr-x. 1 root root 47112 Jan 28 2015 /lib64/librt-2.12.so
lrwxrwxrwx. 1 root root 13 Mar 30 2015 /lib64/librt.so.1 -> librt-2.12.so
[jackie@localhost lib64]$ ls -al /lib64/libpthread*
-rwxr-xr-x. 1 root root 145896 Jan 28 2015 /lib64/libpthread-2.12.so
lrwxrwxrwx. 1 root root 18 Mar 30 2015 /lib64/libpthread.so.0 -> libpthread-2.12.so
[jackie@localhost lib64]$ ls -al /lib64/libc*
-rwxr-xr-x. 1 root root 1926760 Jan 28 2015 /lib64/libc-2.12.so
lrwxrwxrwx. 1 root root 12 Mar 30 2015 /lib64/libc.so.6 -> libc-2.12.so
best,
Jackie
Hi Jackie,
When you see the error it means that the runtime is unable to load the CUDA driver runtime library, “libcuda.so”. Typically it’s because the CUDA driver is not installed but could also be due to permissions or the library could be in a non-standard location.
I’m still unclear what the difference is between the working “regular user” case and the failing case. If you look at the differences between these two environment, this might give you a clue as to the problem.
I will get the error. The execute file information library is following:
libcuda.so is loaded by the OpenACC runtime and not directly linked to your binary. This allows for multiple device targets to compiled into the same binary.
[jackie@localhost lib64]$ ls -al /usr/lib64/nvidia/libcuda.so*
lrwxrwxrwx. 1 root root 17 Oct 13 09:06 /usr/lib64/nvidia/libcuda.so.1 → libcuda.so.367.48
-rwxr-xr-x. 1 root root 8222824 Sep 4 08:52 /usr/lib64/nvidia/libcuda.so.367.48
“/usr/lib64/nvidia” is a typical installation location so the loaded may not be able to find it. Can you try setting the environment variable “LD_LIBRARY_PATH=/usr/lib64/nvdia:$LD_LIBRARY_PATH” to see if that helps? If not, can you try reinstalling the CUDA driver in “/usr/lib64”?
Thanks, Mat,
I’ve managed to get something working with pointers in the way you suggest. But I still get the same error.
I’m still unclear what the difference is between the working “regular user” case and the failing case.
“The working case for regular user” is what I compile the OpenACC example downloaded from websites and those work. But this case doesn’t work. And I still get the same fault as I compiling as root.
“/usr/lib64/nvidia” is a typical installation location so the loaded may not be able to find it. Can you try setting the environment variable “LD_LIBRARY_PATH=/usr/lib64/nvdia:$LD_LIBRARY_PATH” to see if that helps? If not, can you try reinstalling the CUDA driver in “/usr/lib64”?
I try setting the environment variable “LD_LIBRARY_PATH=/usr/lib64/nvdia:$LD_LIBRARY_PATH” and reinstalling. There is nothing changed. I still get the exactly information:
Current region was compiled for:
NVIDIA Tesla GPU sm30 sm35 sm30 sm35 sm50
Available accelerators:
device[1]: Native X86 (CURRENT DEVICE)
The accelerator does not match the profile for which this program was compiled
solver script exited with status 1.
libcuda.so is loaded by the OpenACC runtime and not directly linked to your binary.
Somehow I get the useless “* .so” for execute file, I know that libcuda.so isn’t used. Does this mean when I run the execute file, libcuda.so is loaded by OpenACC. It doesn’t need to linked.
]
[root@localhost code_saturne]# ldd -u cs_solver
Unused direct dependencies:
/home/webber/cs_development/cs-man-build2/prod/arch/lib/libsaturne.so.0
/home/webber/cs_development/cs-man-build2/prod/arch/lib/libple.so.1
/opt/med-3.0.8//lib/libmedC.so.1
/opt/hdf5/lib/libhdf5.so.9
libmpi.so.1
/usr/local/libxml2//lib/libxml2.so.2
/opt/pgi/linux86-64/2016/lib/libblas.so.0
/usr/lib64/libz.so.1
[b][color=red]/usr/lib64/libcuda.so.1[/color][/b]
/usr/lib64/librt.so.1
/opt/pgi/linux86-64/2016/lib/libpgc.so
/usr/lib64/libnuma.so.1
/opt/pgi/linux86-64/2016/lib/libpgmp.so
/opt/pgi/linux86-64/2016/lib/libpgftnrtl.so
/usr/lib64/libdl.so.2
/opt/pgi/linux86-64/2016/lib/libpgf90rtl.so
/opt/pgi/linux86-64/2016/lib/libpgf90.so
/opt/pgi/linux86-64/2016/lib/libpgf90_rpm1.so
/opt/pgi/linux86-64/2016/lib/libpgf902.so
/opt/pgi/linux86-64/2016/lib/libaccapi.so
/opt/pgi/linux86-64/2016/lib/libaccg.so
/opt/pgi/linux86-64/2016/lib/libaccn.so
/opt/pgi/linux86-64/2016/lib/libaccg2.so
/opt/pgi/linux86-64/2016/lib/libcudadevice.so
/usr/lib64/libpthread.so.0
/usr/lib64/libm.so.6
/usr/lib64/libc.so.6
I referred many problem like this in this forum, but they are difference. I can run the sample from website, but can’t run this file. Does this problem connected to libtool or python makefile and so on?
Kind regards,
Jackie****
Hi Jackie,
The PGI OpenACC runtime uses dlopen to open “libcuda.so”. The error indicates that dlopen is failing to open the library.
“The working case for regular user” is what I compile the OpenACC example downloaded from websites and those work. But this case doesn’t work. And I still get the same fault as I compiling as root.
This doesn’t make sense to me. There should be no difference between running the example codes and running a user code within the same environment on the same system.
Can you please run the utilities “pgaccelinfo” and “nvidia-smi”?
Also, can you verify the example codes are actually running on your device? (i.e. set the environment variable PGI_ACC_NOTIFY and note if there are any kernel launches).
Thanks, Mat,
The PGI OpenACC runtime uses dlopen to open “libcuda.so”. The error indicates that dlopen is failing to open the library.
I use libtool to cross-compile a free software project, which call functions(add Openacc directive) by GUI Interface. Does this sets have impact on dlopen? Can I use a scripts to offload for gpu?
the following is about the utilities “pgaccelinfo” and “nvidia-smi”
[jackie@localhost $HOME]$ pgaccelinfo
CUDA Driver Version: 8000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.48 Sat Sep 3 18:21:08 PDT 2016
Device Number: 0
Device Name: Tesla K40c
Device Revision Number: 3.5
Global Memory Size: 11995054080
Number of Multiprocessors: 15
Number of SP Cores: 2880
Number of DP Cores: 960
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 745 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: Yes
Memory Clock Rate: 3004 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 1572864 bytes
Max Threads Per SMP: 2048
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
PGI Compiler Option: -ta=tesla:cc35
Device Number: 1
Device Name: Quadro K5000
Device Revision Number: 3.0
Global Memory Size: 4231135232
Number of Multiprocessors: 8
Number of SP Cores: 1536
Number of DP Cores: 512
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 705 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Ye
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 2700 MHz
Memory Bus Width: 256 bits
L2 Cache Size: 524288 bytes
Max Threads Per SMP: 2048
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
PGI Compiler Option: -ta=tesla:cc30
[jackie@localhost solution.parallel]$ nvidia-smi
Wed Oct 19 20:04:04 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40c Off | 0000:03:00.0 Off | 0 |
| 23% 39C P8 19W / 235W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Quadro K5000 Off | 0000:82:00.0 On | Off |
| 31% 44C P8 16W / 137W | 70MiB / 4035MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 5862 G /usr/bin/Xorg 68MiB |
+-----------------------------------------------------------------------------+
best,
Jackie
Hi Jackie,
Is the OpenACC code being included in a shared object (.so)?
If so, then you’ll need to compile without run-time dynamic compilation (RDC) by adding “-ta=tesla:nordc” to your compilation flags. RDC requires a device link step which isn’t performed if you’re not linking with a PGI driver. The caveat in turning off RDC is that you can no longer make subroutine calls from device code (use inlining instead via the -Minline flag) nor global variables (via the declare directive).