So firstly this is a Red Hat Enterprise Linux machine with a NVIDIA Quadro K4200 device and I felt it reasonable to at least try a bit of CUDA sample code. I installed the CUDA 9.0 kit earlier this year and nvcc seems to do what it claims to do. However the first most trivial bit of code compiles and links but actually does nothing with the GPU. Very strange.
The instructions at that page don’t really work and I don’t know why. However code is code and so it should compile and link neatly … which I do thus :
nvcc -I/usr/local/cuda/include -I. -arch=compute_30 -x cu -dc v3.cpp -o v3.o
nvcc -I/usr/local/cuda/include -I. -arch=compute_30 -x cu -dc particle.cpp -o particle.o
$ nvcc -I/usr/local/cuda/include -I. -arch=compute_30 -x cu -dc main.cpp -o main.o
That gives me the three object files and then :
$ nvcc -L/usr/local/cuda/lib64 -arch=compute_30 -o app main.o particle.o v3.o
Which results in the particle calculation executable “app” :
./app Moved 1000000 particles 100 steps. Average distance traveled is |(30.059565, 36.016434, 23.756998)| = 52.584751
However this has nothing to do with the GPU and is entirely CPU bound.
ldd app linux-vdso.so.1 => (0x00007ffd0fd0e000) librt.so.1 => /lib64/librt.so.1 (0x00007fd33deb7000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd33dc9b000) libdl.so.2 => /lib64/libdl.so.2 (0x00007fd33da96000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd33d78f000) libm.so.6 => /lib64/libm.so.6 (0x00007fd33d48d000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd33d276000) libc.so.6 => /lib64/libc.so.6 (0x00007fd33cea9000) /lib64/ld-linux-x86-64.so.2 (0x000055f7ca973000)
So not sure what the trivial issue is but I am guessing that it has something to do with the need to specify device code as opposed to host code bits and also a bit of linkage with libcuart would be helpful.
Any hints ?
That doesn't get past the compile stage .. not sure why however that is another topic.