I’m getting some errors “trap invalid opcode ip:”. I compile my source code in a CentOS-7 system with kernel 3.10.0-327.13.1.el7.x86_64 called “login-node”. After compilation, I submit my binary to my SLURM cluster in a batch script. Then, script is executed in another machine called “execution-node-01” (with same kernel version). There, binary generates "trap invalid opcode ip: " errors. Hovewer, if I compile directly my source code in “execution-node-01” and execute there, system returns ZERO errors… so… I don’t understantd this problem. Same kernel, same compilers, same CUDA…
I’m running PGI-18.10 and CUDA-9.0 (installed by me).
I compiled with “pgcc -fast -acc -ta=tesla:cc60 -Minfo=all ./test.c -o ./test”. GPU in “execution-node-01” is a Nvidia GeForce GTX 1080 with “PGI Default Target: -ta=tesla:cc60”.
Could anyone help me?
While I’m not familiar with the error “trap invalid opcode ip”, I’m thinking that this is a host side issue when the program is encountering an illegal instruction.
What CPU architectures are on the two systems?
You may need to compile with the flag "-tp ". By default we target CPU instructions for the system that the code is compiled on, which may not be supported on older architectures. For example if you build on a Skylake CPU, we’ll add AVX instructions which aren’t supported on older Intel CPUs such as Penryn. In this case, you’d compile with “-tp penryn” so the instructions will be limited to those supported on Penryn.
You can see all supported architectures by running “pgcc -help -tp”. There’s also the “-tp px” option which will run on any modern x86 based architecture, but may limit some of the optimization that can be applied.
Hope this helps,