-ta nvidia,host and libcuda.so requirement

noe · November 27, 2009, 11:42am

Hi,

I was trying version 10.0 to create unified binaries that run with
or without accelarator.
It seems however, that libcuda.so is required in any case (even
with ACC_DEVICE=host) and must be installed also on platforms
without accelerator.
Also libcuda.so is not shipped with the compiler. Under openSUSE,
for instance, it’s part of the video driver package, which would not
usually be installed without accelerator.

Is this the intended behaviour ? Or shouldn’t the runtime system
try to avoid using any CUDA libraries when apparently no
accelerator is present or wanted.

Also, copying libcuda.so to some place listed in $LD_LIBRARY_PATH
does not help, I then get the error
call to cuInit returned error 100: No device

Regards,
Norbert

MatColgrove · November 30, 2009, 7:43pm

Hi Norbet,

I just double checked, and did not have any problems when I created a unified binary on a system with a NVIDIA GPU and then ran it on another without a GPU. I was only able to recreate your error when I compiled with just “-ta=nvidia”. Can you please double check that you compiled with “-ta=nvidia,host”?

Thanks,
Mat

noe · December 1, 2009, 9:16am

Ok, one by one. I was actually using one of the official examples:

hostA% pgaccelinfo | grep ‘Device Name’
Device Name: Tesla C1060
Device Name: Tesla C1060
hostA% pgfortran -o f2.uni f2.f90 -ta=nvidia,host -Minfo -fast
main:
1, PGI Unified Binary version for -tp=nehalem-64 -ta=host
20, Unrolled inner loop 8 times
26, Generated an alternate loop for the loop
Generated vector sse code for the loop
Generated a prefetch instruction for the loop
32, Generated an alternate loop for the loop
Generated vector sse code for the loop
Generated a prefetch instruction for the loop
38, Loop not vectorized/parallelized: contains call
main:
1, PGI Unified Binary version for -tp=nehalem-64 -ta=nvidia
20, Unrolled inner loop 8 times
25, Generating copyin(a(1:n))
Generating copyout(r(1:n))
26, Loop is parallelizable
Accelerator kernel generated
26, !$acc do parallel, vector(256)
32, Generated an alternate loop for the loop
Generated vector sse code for the loop
Generated a prefetch instruction for the loop
38, Loop not vectorized/parallelized: contains call
hostA% ./f2.uni
100000 iterations completed
1230 microseconds on GPU
1482 microseconds on host

Now hostB, same directory, same environment but no cuda installed (and no accelerator HW):

hostB% pgaccelinfo | grep ‘Device Name’
hostB% ./f2.uni
libcuda.so not found, exiting
hostB% ACC_DEVICE=host ./f2.uni
libcuda.so not found, exiting

At this point I noticed that I had not mentioned fortran in my
original post and Mat is probably using C.
So same test with a C example:

…
hostB% ./c2.uni
100000 iterations completed
1546 microseconds on GPU
1530 microseconds on host
hostB%

Aaaah. so it’s probably a fortran runtime problem.
Also interesting: hostB reports having spent some time on the non-existant GPU. But that’s off-topic.

Norbert

MatColgrove · December 1, 2009, 4:38pm

Hi Norbert,

The example code you are using has the following line:

  call acc_init( acc_device_nvidia )

In other words, by using this runtime call, the code is forcing the use of the NVIDIA device. Changing acc_init to use “acc_device_default” will allow you to use the unified binary.

Note that the c2 C example has the same issue.

Hope this helps,
Mat

Topic		Replies	Views
Problem with Unified Binary on host without GPU Legacy PGI Compilers	2	3260	August 26, 2015
suggestions for debugging unified binary problems? Legacy PGI Compilers	2	9204	September 15, 2009
Compilation for Tesla P100 Legacy PGI Compilers	4	5830	January 10, 2017
pgfortran works for cuda but not OpenACC Legacy PGI Compilers	7	6789	January 13, 2016
No Available accelerator Legacy PGI Compilers	7	6666	November 9, 2016
Unified binary segfault if running on host Legacy PGI Compilers	2	2413	February 6, 2012
libcuda.so not found Legacy PGI Compilers	1	3359	August 26, 2016
Unified binary creation with -tp and NVIDIA HPC SDK 21.2 Legacy PGI Compilers	13	1017	April 23, 2021
Error Message: call to cuInit returned error 100: No device Legacy PGI Compilers	3	19622	July 12, 2010
Using Unified Memory on GeForce devices Legacy PGI Compilers	1	1555	February 8, 2018

-ta nvidia,host and libcuda.so requirement

Related topics