No Available accelerator

Hi,

Compiling appears successful with pgcc 16.3. However, I am not able to run on the nvidia GPU.
The following configure:

$HOME/configure \
...... \
--disable-static \
--enable-shared \
--host=x86_64-unknown-linux-gnu \
LDFLAGS="-L/opt/pgi/linux86-64/16.3/lib -L/usr/lib64" \
LIBS="-lcuda -lm -lpthread -lrt -lpgc -lnspgc -lnuma -lpgmp -lpgftnrtl -lz -ldl -lpgf90rtl -lpgf90 -lpgf90_rpm1 -lpgf902" \
CC=/opt/pgi/linux86-64/16.3/bin/pgcc \
CFLAGS="-c99 -fast -acc -ta=tesla:nordc -shared -Minfo=accel -fPIC -noswitcherror" \
FC=/opt/pgi/linux86-64/16.3/bin/pgf95 \
FCFLAGS="-fast -Mpreprocess -noswitcherror" \
CXX=/opt/pgi/linux86-64/16.3/bin/pgc++ \
CXXFLAGS="-noswitcherror -acc -ta=tesla:nordc -shared -Minfo=accel"

Get the following error:

Current file: 
$HOME/cs_matrix.c
function: _mat_vec_p_l_csr
line: 2237
Current region was compiled for:
NVIDIA Tesla GPU sm30 sm35 sm30 sm35 sm50
Available accelerators:
device[1]: Native X86 (CURRENT DEVICE)
The accelerator does not match the profile for which this program was compiled
solver script exited with status 1.

Error running the calculation.

Running ldd comand as a regular user:.

[jackie@localhost $HOME]$ ldd -v cs_solver 
	linux-vdso.so.1 =>  (0x00007fffcbf46000)
	libsaturne.so.0 => /home/huchuanwei/Desktop/saturne_build2.4/prod/arch/lib/libsaturne.so.0 (0x00007f5045f69000)
	libple.so.1 => /home/huchuanwei/Desktop/saturne_build2.4/prod/arch/lib/libple.so.1 (0x00007f5045d62000)
	libmedC.so.1 => /opt/med-3.0.8//lib/libmedC.so.1 (0x00007f5045a53000)
	libhdf5.so.9 => /opt/hdf5/lib/libhdf5.so.9 (0x00007f504557d000)
	libxml2.so.2 => /opt/libxml2//lib/libxml2.so.2 (0x00007f5045223000)
	libblas.so.0 => /opt/pgi/linux86-64/16.3/lib/libblas.so.0 (0x00007f50440cd000)
	libz.so.1 => /lib64/libz.so.1 (0x0000003fd1e00000)
	libcuda.so.1 => /usr/lib64/nvidia/libcuda.so.1 (0x00007f50436c4000)
	librt.so.1 => /lib64/librt.so.1 (0x0000003fd2200000)
	libpgc.so => /opt/pgi/linux86-64/16.3/lib/libpgc.so (0x00007f504343b000)
	libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003fd3a00000)
	libpgmp.so => /opt/pgi/linux86-64/16.3/lib/libpgmp.so (0x00007f50431bb000)
	libpgftnrtl.so => /opt/pgi/linux86-64/16.3/lib/libpgftnrtl.so (0x00007f5042f85000)
	libdl.so.2 => /lib64/libdl.so.2 (0x0000003fd1600000)
	libpgf90rtl.so => /opt/pgi/linux86-64/16.3/lib/libpgf90rtl.so (0x00007f5042d5f000)
	libpgf90.so => /opt/pgi/linux86-64/16.3/lib/libpgf90.so (0x00007f50427b4000)
	libpgf90_rpm1.so => /opt/pgi/linux86-64/16.3/lib/libpgf90_rpm1.so (0x00007f50425b2000)
	libpgf902.so => /opt/pgi/linux86-64/16.3/lib/libpgf902.so (0x00007f504239f000)
	libcudadevice.so => /opt/pgi/linux86-64/16.3/lib/libcudadevice.so (0x00007f504218e000)
	libaccapi.so => /opt/pgi/linux86-64/16.3/lib/libaccapi.so (0x00007f5041f71000)
	libaccg.so => /opt/pgi/linux86-64/16.3/lib/libaccg.so (0x00007f5041d55000)
	libaccn.so => /opt/pgi/linux86-64/16.3/lib/libaccn.so (0x00007f5041b31000)
	libaccg2.so => /opt/pgi/linux86-64/16.3/lib/libaccg2.so (0x00007f5041924000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003fd1a00000)
	libm.so.6 => /lib64/libm.so.6 (0x0000003fd1200000)
	libc.so.6 => /lib64/libc.so.6 (0x0000003fd0e00000)
	libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003cd5400000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003cd5000000)
	libnvidia-fatbinaryloader.so.367.48 => /usr/lib64/nvidia/libnvidia-fatbinaryloader.so.367.48 (0x00007f50416d3000)
	/lib64/ld-linux-x86-64.so.2 (0x0000003fd0a00000)

best,

Jackie

Hi Jackie,

What’s the difference between a “regular user” and the case where it doesn’t work?

My best guess is the erroneous case doesn’t have execute permissions on the CUDA driver runtime library, “libcuda.so”.

  • Mat

Hi, Mat, Thank you so much for your quick reply.
what I forgot to say is that C codes add some OpenACC directives for GPU, another codes has nothing OpenACC directives.
I think regular user have execute permissions “libcuda.so”, and I can use OpenACC directly in regular user for simply codes.
When I run the execute file(./cs_solver which use python ), I will get the error. The execute file information library is following:

Version information:
	./cs_solver:
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
	/home/jackie/Desktop/saturne_build2.4/prod/arch/lib/libsaturne.so.0:
		libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
		librt.so.1 (GLIBC_2.2.5) => /lib64/librt.so.1
		libxml2.so.2 (LIBXML2_2.4.30) => /opt/libxml2//lib/libxml2.so.2
		libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
		libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
		libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/home/jackie/Desktop/saturne_build2.4/prod/arch/lib/libple.so.1:
		libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/med-3.0.8//lib/libmedC.so.1:
		libgcc_s.so.1 (GCC_3.0) => /lib64/libgcc_s.so.1
		libstdc++.so.6 (GLIBCXX_3.4.9) => /usr/lib64/libstdc++.so.6
		libstdc++.so.6 (CXXABI_1.3) => /usr/lib64/libstdc++.so.6
		libstdc++.so.6 (GLIBCXX_3.4) => /usr/lib64/libstdc++.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/hdf5/lib/libhdf5.so.9:
		libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
		libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
		libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/libxml2//lib/libxml2.so.2:
		libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
		libz.so.1 (ZLIB_1.2.2.3) => /lib64/libz.so.1
		libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.7) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
		libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
	/lib64/libz.so.1:
		libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/usr/lib64/nvidia/libcuda.so.1:
		libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
		libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
		librt.so.1 (GLIBC_2.2.5) => /lib64/librt.so.1
		libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/lib64/librt.so.1:
		libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
		libpthread.so.0 (GLIBC_PRIVATE) => /lib64/libpthread.so.0
		libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
		libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libpgc.so:
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/usr/lib64/libnuma.so.1:
		ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libpgmp.so:
		ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.3.4) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libpgftnrtl.so:
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/lib64/libdl.so.2:
		ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libpgf90rtl.so:
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libpgf90.so:
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libpgf90_rpm1.so:
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libpgf902.so:
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libcudadevice.so:
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libaccapi.so:
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
	/opt/pgi/linux86-64/16.3/lib/libaccg.so:
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libaccn.so:
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/opt/pgi/linux86-64/16.3/lib/libaccg2.so:
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/lib64/libpthread.so.0:
		ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
		ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
		ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
		libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
		libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/lib64/libm.so.6:
		libc.so.6 (GLIBC_PRIVATE) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/lib64/libc.so.6:
		ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
		ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
	/usr/lib64/libstdc++.so.6:
		libm.so.6 (GLIBC_2.2.5) => /lib64/libm.so.6
		ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
		libgcc_s.so.1 (GCC_4.2.0) => /lib64/libgcc_s.so.1
		libgcc_s.so.1 (GCC_3.3) => /lib64/libgcc_s.so.1
		libgcc_s.so.1 (GCC_3.0) => /lib64/libgcc_s.so.1
		libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.3) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.3.2) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/lib64/libgcc_s.so.1:
		libc.so.6 (GLIBC_2.4) => /lib64/libc.so.6
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6
	/usr/lib64/nvidia/libnvidia-fatbinaryloader.so.367.48:
		libdl.so.2 (GLIBC_2.2.5) => /lib64/libdl.so.2
		libpthread.so.0 (GLIBC_2.2.5) => /lib64/libpthread.so.0
		libc.so.6 (GLIBC_2.2.5) => /lib64/libc.so.6

same like indirectly using libcuda.so

[jackie@localhost lib64]$ ls -al /usr/lib64/nvidia/libcuda.so*
lrwxrwxrwx. 1 root root      17 Oct 13 09:06 /usr/lib64/nvidia/libcuda.so.1 -> libcuda.so.367.48
-rwxr-xr-x. 1 root root 8222824 Sep  4 08:52 /usr/lib64/nvidia/libcuda.so.367.48

[jackie@localhost lib64]$ ls -al /lib64/libm*
-rwxr-xr-x. 1 root root 599392 Jan 28  2015 /lib64/libm-2.12.so
lrwxrwxrwx. 1 root root     12 Mar 30  2015 /lib64/libm.so.6 -> libm-2.12.so

[jackie@localhost lib64]$ ls -al /lib64/librt*
-rwxr-xr-x. 1 root root 47112 Jan 28  2015 /lib64/librt-2.12.so
lrwxrwxrwx. 1 root root    13 Mar 30  2015 /lib64/librt.so.1 -> librt-2.12.so

[jackie@localhost lib64]$ ls -al /lib64/libpthread*
-rwxr-xr-x. 1 root root 145896 Jan 28  2015 /lib64/libpthread-2.12.so
lrwxrwxrwx. 1 root root     18 Mar 30  2015 /lib64/libpthread.so.0 -> libpthread-2.12.so

[jackie@localhost lib64]$ ls -al /lib64/libc*
-rwxr-xr-x. 1 root root 1926760 Jan 28  2015 /lib64/libc-2.12.so
lrwxrwxrwx. 1 root root      12 Mar 30  2015 /lib64/libc.so.6 -> libc-2.12.so

best,

Jackie

Hi Jackie,

When you see the error it means that the runtime is unable to load the CUDA driver runtime library, “libcuda.so”. Typically it’s because the CUDA driver is not installed but could also be due to permissions or the library could be in a non-standard location.

I’m still unclear what the difference is between the working “regular user” case and the failing case. If you look at the differences between these two environment, this might give you a clue as to the problem.

I will get the error. The execute file information library is following:

libcuda.so is loaded by the OpenACC runtime and not directly linked to your binary. This allows for multiple device targets to compiled into the same binary.

[jackie@localhost lib64]$ ls -al /usr/lib64/nvidia/libcuda.so*
lrwxrwxrwx. 1 root root 17 Oct 13 09:06 /usr/lib64/nvidia/libcuda.so.1 -> libcuda.so.367.48
-rwxr-xr-x. 1 root root 8222824 Sep 4 08:52 /usr/lib64/nvidia/libcuda.so.367.48

“/usr/lib64/nvidia” is a typical installation location so the loaded may not be able to find it. Can you try setting the environment variable “LD_LIBRARY_PATH=/usr/lib64/nvdia:$LD_LIBRARY_PATH” to see if that helps? If not, can you try reinstalling the CUDA driver in “/usr/lib64”?

  • Mat

Thanks, Mat,

I’ve managed to get something working with pointers in the way you suggest. But I still get the same error.

I’m still unclear what the difference is between the working “regular user” case and the failing case.

“The working case for regular user” is what I compile the OpenACC example downloaded from websites and those work. But this case doesn’t work. And I still get the same fault as I compiling as root.

“/usr/lib64/nvidia” is a typical installation location so the loaded may not be able to find it. Can you try setting the environment variable “LD_LIBRARY_PATH=/usr/lib64/nvdia:$LD_LIBRARY_PATH” to see if that helps? If not, can you try reinstalling the CUDA driver in “/usr/lib64”?

I try setting the environment variable “LD_LIBRARY_PATH=/usr/lib64/nvdia:$LD_LIBRARY_PATH” and reinstalling. There is nothing changed. I still get the exactly information:

Current region was compiled for:
NVIDIA Tesla GPU sm30 sm35 sm30 sm35 sm50
Available accelerators:
device[1]: Native X86 (CURRENT DEVICE)
The accelerator does not match the profile for which this program was compiled
solver script exited with status 1.



libcuda.so is loaded by the OpenACC runtime and not directly linked to your binary.

Somehow I get the useless “* .so” for execute file, I know that libcuda.so isn’t used. Does this mean when I run the execute file, libcuda.so is loaded by OpenACC. It doesn’t need to linked.

]
[root@localhost code_saturne]# ldd -u cs_solver 
Unused direct dependencies:
	/home/webber/cs_development/cs-man-build2/prod/arch/lib/libsaturne.so.0
	/home/webber/cs_development/cs-man-build2/prod/arch/lib/libple.so.1
	/opt/med-3.0.8//lib/libmedC.so.1
	/opt/hdf5/lib/libhdf5.so.9
	libmpi.so.1
	/usr/local/libxml2//lib/libxml2.so.2
	/opt/pgi/linux86-64/2016/lib/libblas.so.0
	/usr/lib64/libz.so.1
	[b][color=red]/usr/lib64/libcuda.so.1[/color][/b]
	/usr/lib64/librt.so.1
	/opt/pgi/linux86-64/2016/lib/libpgc.so
	/usr/lib64/libnuma.so.1
	/opt/pgi/linux86-64/2016/lib/libpgmp.so
	/opt/pgi/linux86-64/2016/lib/libpgftnrtl.so
	/usr/lib64/libdl.so.2
	/opt/pgi/linux86-64/2016/lib/libpgf90rtl.so
	/opt/pgi/linux86-64/2016/lib/libpgf90.so
	/opt/pgi/linux86-64/2016/lib/libpgf90_rpm1.so
	/opt/pgi/linux86-64/2016/lib/libpgf902.so
	/opt/pgi/linux86-64/2016/lib/libaccapi.so
	/opt/pgi/linux86-64/2016/lib/libaccg.so
	/opt/pgi/linux86-64/2016/lib/libaccn.so
	/opt/pgi/linux86-64/2016/lib/libaccg2.so
	/opt/pgi/linux86-64/2016/lib/libcudadevice.so
	/usr/lib64/libpthread.so.0
	/usr/lib64/libm.so.6
	/usr/lib64/libc.so.6

I referred many problem like this in this forum, but they are difference. I can run the sample from website, but can’t run this file. Does this problem connected to libtool or python makefile and so on?

Kind regards,

Jackie****

Hi Jackie,

The PGI OpenACC runtime uses dlopen to open “libcuda.so”. The error indicates that dlopen is failing to open the library.

“The working case for regular user” is what I compile the OpenACC example downloaded from websites and those work. But this case doesn’t work. And I still get the same fault as I compiling as root.

This doesn’t make sense to me. There should be no difference between running the example codes and running a user code within the same environment on the same system.

Can you please run the utilities “pgaccelinfo” and “nvidia-smi”?
Also, can you verify the example codes are actually running on your device? (i.e. set the environment variable PGI_ACC_NOTIFY and note if there are any kernel launches).

  • Mat

Thanks, Mat,

The PGI OpenACC runtime uses dlopen to open “libcuda.so”. The error indicates that dlopen is failing to open the library.

I use libtool to cross-compile a free software project, which call functions(add Openacc directive) by GUI Interface. Does this sets have impact on dlopen? Can I use a scripts to offload for gpu?

the following is about the utilities “pgaccelinfo” and “nvidia-smi”

[jackie@localhost $HOME]$ pgaccelinfo 

CUDA Driver Version:           8000
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  367.48  Sat Sep  3 18:21:08 PDT 2016

Device Number:                 0
Device Name:                   Tesla K40c
Device Revision Number:        3.5
Global Memory Size:            11995054080
Number of Multiprocessors:     15
Number of SP Cores:            2880
Number of DP Cores:            960
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    745 MHz
Execution Timeout:             No
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   Yes
Memory Clock Rate:             3004 MHz
Memory Bus Width:              384 bits
L2 Cache Size:                 1572864 bytes
Max Threads Per SMP:           2048
Async Engines:                 2
Unified Addressing:            Yes
Managed Memory:                Yes
PGI Compiler Option:           -ta=tesla:cc35

Device Number:                 1
Device Name:                   Quadro K5000
Device Revision Number:        3.0
Global Memory Size:            4231135232
Number of Multiprocessors:     8
Number of SP Cores:            1536
Number of DP Cores:            512
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    705 MHz
Execution Timeout:             Yes
Integrated Device:             No
Can Map Host Memory:           Ye
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   No
Memory Clock Rate:             2700 MHz
Memory Bus Width:              256 bits
L2 Cache Size:                 524288 bytes
Max Threads Per SMP:           2048
Async Engines:                 2
Unified Addressing:            Yes
Managed Memory:                Yes
PGI Compiler Option:           -ta=tesla:cc30



[jackie@localhost solution.parallel]$ nvidia-smi 
Wed Oct 19 20:04:04 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          Off  | 0000:03:00.0     Off |                    0 |
| 23%   39C    P8    19W / 235W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K5000        Off  | 0000:82:00.0      On |                  Off |
| 31%   44C    P8    16W / 137W |     70MiB /  4035MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    1      5862    G   /usr/bin/Xorg                                   68MiB |
+-----------------------------------------------------------------------------+

best,

Jackie

Hi Jackie,

Is the OpenACC code being included in a shared object (.so)?

If so, then you’ll need to compile without run-time dynamic compilation (RDC) by adding “-ta=tesla:nordc” to your compilation flags. RDC requires a device link step which isn’t performed if you’re not linking with a PGI driver. The caveat in turning off RDC is that you can no longer make subroutine calls from device code (use inlining instead via the -Minline flag) nor global variables (via the declare directive).

  • Mat