I have a problem and hopefully someone can point me in the right direction.
I am using armadillo (C++ linear algebra libraray) and want to accelerate some part using cuda. I had some very strange behaviour and finally figured out, that the cx_mat class (and probably others as well) have different size, if I compile them using nvcc vs. g++.
I came accross this:
And the answer is to append a -malign-double to g++ as nvcc is supposed to also do that. I know about the problems of compilers beeing allowed to pack structs/classes differently.
However, in my case this did not help. Interestingly the size of cx_mat is smaller on nvcc, which should not be the case if it packs it in 8 byte blocks and the g++ doesn’t.
As far as I understood the host code will be compiled by the host compiler, which should be the same in my case. How can I figure out which arguments are passed down to that host compiler by nvcc so I can set the same for my g++ part? Then everything should be fine.
Edit: Here is a little example code to show the problem
// Compile with g++: g++ cudaSizeProblem.cpp -o cudaSizeProblemG++
// Compile with nvcc: nvcc cudaSizeProblem.cpp -o cudaSizeProblemNvcc
using namespace std;
using namespace arma;
int main(int argc, char** argv)
Edit2: I was able to intersect the call(s) to g++ by manually specifiying a host compiler using -ccbin and writing a small bash script which put the parameters into a text file. Here are the three calls:
-c -x c++ -D__NVCC__ -I/usr/local/cuda-10.0/bin/…/targets/x86_64-linux/include -D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=0 -D__CUDACC_VER_BUILD__=130 -m64 -o /tmp/tmpxft_0000349b_00000000-4_cudaSizeProblem.o cudaSizeProblem.cpp
-c -x c++ -DFATBINFILE="/tmp/tmpxft_0000349b_00000000-3_cudaSizeProblemNvcc_dlink.fatbin.c" -DREGISTERLINKBINARYFILE="/tmp/tmpxft_0000349b_00000000-2_cudaSizeProblemNvcc_dlink.reg.c" -I. -D__NV_EXTRA_INITIALIZATION= -D__NV_EXTRA_FINALIZATION= -D__CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS__ -I/usr/local/cuda-10.0/bin/…/targets/x86_64-linux/include -D__CUDACC_VER_MAJOR__=10 -D__CUDACC_VER_MINOR__=0 -D__CUDACC_VER_BUILD__=130 -m64 -o /tmp/tmpxft_0000349b_00000000-6_cudaSizeProblemNvcc_dlink.o /usr/local/cuda-10.0/bin/crt/link.stub
-m64 -o cudaSizeProblemNvcc -Wl,–start-group /tmp/tmpxft_0000349b_00000000-6_cudaSizeProblemNvcc_dlink.o /tmp/tmpxft_0000349b_00000000-4_cudaSizeProblem.o -L/usr/local/cuda-10.0/bin/…/targets/x86_64-linux/lib/stubs -L/usr/local/cuda-10.0/bin/…/targets/x86_64-linux/lib -lcudadevrt -lcudart_static -lrt -lpthread -ldl -Wl,–end-group
I cannot see anything suspicious.