gcc passing compiler options to nvcc release 8.0, V8.0.26 - cudafe died signal 11

cudafe dies during gcc compilation with nvcc error signal 11 (Invalid memory reference)

I have tried to solve the problem by passing the following:-

gcc ADD_CFLAGS = -m64 -mavx2 -mfma -o -shared -pipe -time -mtune=native -fPIC -std=c++11 -Dnvcc–compiler-options=’–nvlink-options --gpu-architecture=compute_52 --gpu-code=sm52 --shared --relocatable-device-code=true–compile’–ptxas-options=’–allow-expensive-optimizations --gpu-name sm52 -m64’

I am on RHEL7 SL7 with an Intel Broadwell Core i7 5960 Extreme Edition CPU and triple head Nvidia GTX90 SLI bridged Maxwell video cards

I added the above flags to the config.mk file trying to solve the problem, but I think I need them anyway to optimize the build.

Is this related to earlier bugs reported with cuda?
If so it was supposed to be solved with subsequent releases.

The problem occurred also without the -std=c++11 flag.

How do we avoid invalid memory references such as this?

Assuming the toolchain is installed correctly (no corrupt or missing files), a segfault in the compiler during compilation is never a reasonable response, as opposed to an orderly abnormal termination with an appropriate error message. It should always be considered a bug.

I would suggest filing a bug report with NVIDIA, using the form linked from the CUDA registered developer website. In my experience, there are rarely workarounds for bugs of this nature, but if you file a bug there is a chance the compiler team has a recommendation as to how to avoid it.

Thanks for the quick reply.
I understand it could well be a bug and may consider filing a bug report.

The error message I receive is this:-

nvcc error : ‘cudafe’ died due to signal 11 (Invalid memory reference)
nvcc error : ‘cudafe’ core dumped

From what I have read this has been an ongoing issue since Cuda-2.0
At release 8.0 I would have thought the problem was solved by now.

Any further ideas or suggestions ?

8.0.26 is CUDA 8 RC

before you file a bug, update your system to CUDA 8 (production release) which should be 8.0.44 or something like that.

The error message you are reporting is a generic one that could happen with various different compiler issues. It’s almost certainly not due to a single issue that has been around since CUDA 2.0 and never fixed. So just because you are finding reports of that error message dating back to CUDA 2.0 does not mean that you are having the same underlying issue in the compiler that has never been fixed.

Thanks txbob, I did think of the same solution and did:-

yum reinstall cuda

I get the follwoing query result

$ nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Wed_May__4_21:01:56_CDT_2016
Cuda compilation tools, release 8.0, V8.0.26

When I do:-

yum update cuda

I get No packages marked for update

I will check for 8.0.44

so obviously your method is broken

If you want to install the latest version of cuda, I suggest you go to:

http://www.nvidia.com/getcuda

find the installation guide appropriate for your OS (i.e. the linux install guide), and follow the instructions there.

Thanks again txbob.

would it be safest to do:-

yum remove cuda

prior to install 8.0.44 ?

I did
$sudo yum remove cuda

and then followed the install instructions for cuda-8.0.44

$ nvcc --version
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

The same build error persists

nvcc error : ‘cudafe’ died due to signal 11 (Invalid memory reference)
nvcc error : ‘cudafe’ core dumped

Any other ideas?

As a sanity check: are you able to successfully build the example programs that ship with CUDA? If so, that means the installed toolchain is functional, and it is extremely likely you are hitting a bug in the CUDAFE component of the CUDA toolchain.

For filing a bug report with NVIDIA, you would want to prepare the smallest possible self-contained code that reproduces the issue and attach that to the bug report. A single, short, source code file plus the nvcc commandline invocation that triggers the segfault would be ideal for this purpose.

agreed, do what njuffa said, then file a bug