NVCC fails with segfault

Hey everyone, I’m a little bit at loss here

I’m running a Jetson Nano 2GB with Jetpack 4.6.1 installed.

I have a CUDA based application that I have written on Linux and want to try on Jetson for the first time.
However when I want to compile the application on the Jetson, nvcc (ver. 10.2 that is automatically installed by SDK Manager) fails with a segmentation fault.

Again, it doesn’t show any errors or syntactic faults in the code, it just fails with code 139. Even more curiously if I compile the application on an Ubuntu x64 18.04 with CUDA+ nvcc ver 10.1 there are no issues.

It seems I can’t find the root cause of the problem here, is it a compiler bug? Glad for every hint that could point me in the right direction

I can’t tell you what the error is, but it would be interesting to find out if you can run nvcc in gdb. Even if you can it is likely there are no debug symbols, but if for some reason you could get a backtrace with information on the stack frame (other than just addresses) it would help.

Another possibility is to use strace to gather information (you might need to “sudo apt-get install strace”). You would have to run this as root, which means your intermediate output would be owned by root, and you’d perhaps have to manually delete that output when you are done testing, but this would provide some interesting logs:

sudo strace -oTraceLog.txt nvcc ...whatever arguments you normally use...
# Compress that and attach as a file. Example:
sudo gzip -9 TraceLog.txt

I am curious if someone from NVIDIA has a reference to the various error codes, and of course especially for the current code 139?

1 Like

Thanks for the tip, unfortunately trace returns the error:

strace: can’t stat ‘nvcc’: No such file or directory

A quick google search tells me that this is the case if the given argument is not an executable yet i can definitely execute nvcc, any ideas where that might be coming from?

In that case nvcc is not in your default shell search path. You’d simply use the full path.

Log.txt (9.0 KB)

Okay that worked, I have attached the strace output, theres also the exit_group(139) call in there

Is that the full log? I see only 122 lines, and nothing obvious within those lines. The full log would probably be much much larger. Usually such an error is within the last 500 lines or so.

Yeah I also wondered why it’s so so short, but it’s the full log as produced by strace.

Can I verify you are running nvcc as a regular user?

Perhaps there is a reduction in output due to not counting other threads or forks. Try running strace like this, and see if the produced trace log becomes larger:

strace -f -oTraceLog.txt /usr/local/cuda/bin/nvcc ...whatever arguments...

(assumes full path to strace is /usr/local/cuda/bin/nvcc)

Note that the difference between this and the earlier command is the “-f” option.

TraceLog.txt (958.4 KB)

Okay, now we have about 10000 lines. The SIGSEGV occurs at line 9378. This happens inside the PTX assembler if I’m not mistaken, however I still can’t point out the root cause

This is just a shot in the dark, but since the compiler is checking for language encodings just prior to one SIGSEGV, what does that user see for “echo $LANG” and “pwd” when you’re at the directory you run the nvcc command from? It is expected that a lot of encodings will be searched for and only one will be found, but if it is an encoding issue, then it should be simple to solve.

Also, it might be useful to add the “-y” option to strace so that it shows full path to any file mentioned. Can you run strace again with all of the previous options, but also with “-y”?

strace -y -f -oTraceLog.txt /usr/local/cuda/bin/nvcc ...whatever arguments...

I checked encoding, it is en_us.UTF8 so I can’t imagine this being a problem. In fact as far as I remember I haven’t used any symbol that’s non ASCII even in comments.

What I also did was deliberately introducing syntactic and semantic errors in the code in order to see if the compiler would catch them, which it did so it seems to me the error is happening after the source has been tokenized. Do you agree with this line of thinking or am I making a false assumption here?

Sanity checking is pretty much always a good idea. However, what do you see from the “pwd” command when in the directory you invoke the nvcc command from? There are sometimes issues (unrelated to $LANG) with path quoting.

Path is: /home/luis/gpueikonal/gpuEikonal
No quotes displayed

That path should work without issue. At this point I’m going to suggest someone from NVIDIA look up “code 139” for CUDA 10.2 and see what that indicates.

1 Like

Alright, let’s see if someone from NVIDIA reads this. Thank you for your time @linuxdev :)

There is no update from you for a period, assuming this is not an issue any more.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

Hi,

Would you mind sharing the source you compiled with?
We want to reproduce it internally to get more information first.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.