I am trying to create a static lib (archive file in principle, though just a .o in this example) on Ubuntu 20.04 with nvc 21.9 and link this into a final executable with gcc. It can be linked and executed with nvc or when I create a shared object, but not with the object file. With gcc it links (after adding a bunch of libs), but when executing it fails:
$ ./foo-main
Accelerator Fatal Error: No CUDA device code available
File: /home/uie55546/git/ti-cuda-examples/src/scratch/foo.c
Function: process:4
Line: 4
Is creating a static archive with device code and linking with gcc possible?
I found a lot of older info on this topic with -ta=nordc
issues with older versions of the PGI compilers that should be fixed with the HPC SDK compilers. I am hoping that there is an updated documentation for how to do this somewhere that I simply haven’t found yet and would apreciate some pointers.
Details of what I did:
//foo.h
typedef struct points {
float* x; float* y;
int n;
} points;
void process(points point);
//foo.c
#include "foo.h"
void process(points p) {
#pragma acc parallel loop copy(p, p.x[:p.n]) copyin(p.y[:p.n])
for (int i=0; i<p.n; ++i ) p.x[i] += p.y[i];
}
//foo-main.c
#include <stdlib.h>
#include <stdio.h>
#include "foo.h"
int main() {
points p;
p.n = 1000;
p.x = ( float*) malloc ( sizeof ( float )*p.n );
p.y = ( float*) malloc ( sizeof ( float )*p.n );
process(p);
printf("all done, exiting\n");
}
Object file:
$ nvc -fPIC -Minfo=accel -O3 -acc -c -o foo.o foo.c
process:
4, Generating copy(p) [if not already present]
Generating copyin(p.y[:p.n]) [if not already present]
Generating copy(p.x[:p.n]) [if not already present]
Generating Tesla code
6, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
$ gcc -std=c11 foo-main.c -o foo-main foo.o -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/cuda/11.4/targets/x86_64-linux/lib -Wl,-rpath /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib/acc_init_link_cuda.o /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib/acc_init_link_host.o /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib/acc_init_link_acc.o -lacchost -laccdevaux -laccdevice -ldl -lcudadevice -latomic -lnvhpcatm -lstdc++ -lnvomp -lnvc -lnvcpumath -lm -lcudadevrt -lcudart_static -lrt -lpthread
$ ./foo-main
Accelerator Fatal Error: No CUDA device code available
File: /home/uie55546/git/ti-cuda-examples/src/scratch/foo.c
Function: process:4
Line: 4
Shared library:
$ nvc -fPIC -Minfo=accel -O3 -acc -shared -o foo.so foo.c
process:
4, Generating copy(p) [if not already present]
Generating copyin(p.y[:p.n]) [if not already present]
Generating copy(p.x[:p.n]) [if not already present]
Generating Tesla code
6, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
$ gcc -std=c11 foo-main.c -o foo-main foo.so -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/cuda/11.4/targets/x86_64-linux/lib -Wl,-rpath /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib/acc_init_link_cuda.o /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib/acc_init_link_host.o /opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/lib/acc_init_link_acc.o -lacchost -laccdevaux -laccdevice -ldl -lcudadevice -latomic -lnvhpcatm -lstdc++ -lnvomp -lnvc -lnvcpumath -lm -lcudadevrt -lcudart_static -lrt -lpthread
$ LD_LIBRARY_PATH="." ./foo-main
all done, exiting
Linking with nvc:
$ nvc -acc foo-main.c -o foo-main foo.o
foo-main.c:
$ ./foo-main
all done, exiting
For reference, I used the -dryrun
option for nvc to get the required libs for the gcc command line.