-
Ensure that the application runs fine without the tool. You likely already did that, so just double-checking.
To demo, here’s actually two applications. For the second, longer application, the crash seems to happen around the time of the first CUDA runtime activity.
./matrixMul
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “TITAN V” with compute capability 7.0
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
Performance= 2539.18 GFlop/s, Time= 0.052 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.
nv-nsight-cu-cli ./matrixMul
[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 72453 (/home/tnallen/build/NVIDIA_CUDA-10.0_Samples/0_Simple/matrixMul/matrixMul)
==ERROR== The application returned an error code (11)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option
./w2v -read-vocab vocab1/text8.vocab -read-corpus-cache corpi/cache/text8.corpus_cache -debug 1 -output grad2.emb -epoch 20 -threads 4 -streams 4
number of epochs: 20
max-sentence-length: 1000
sample: 0.001000
window size: 5
initial learning rate: 0.02500
number of negative samples: 5
layer size: 100
kernel batch size: 200
streams per thread: 4
number of threads: 4
train using file:
Loading vocabulary from file: vocab1/text8.vocab
Vocab size: 71291
Words in train file: 16718844
Loading corpus cache from file: corpi/cache/text8.corpus_cache
Load 99.98%
Loaded
Initializing network
Network ready
Kernel Selection: neg_w2v_kernel_small – Block Structure == x <negatives+1>
Kernel Structure: <<<(6,200,1), (100,1,1)>>>
— cut for brevity —
Saving model to: grad2.emb
nv-nsight-cu-cli ./w2v -read-vocab vocab1/text8.vocab -read-corpus-cache corpi/cache/text8.corpus_cache -debug 1 -output grad2.emb -epoch 20 -threads 4 -streams 4
number of epochs: 20
max-sentence-length: 1000
sample: 0.001000
window size: 5
initial learning rate: 0.02500
number of negative samples: 5
layer size: 100
kernel batch size: 200
streams per thread: 4
number of threads: 4
train using file:
Loading vocabulary from file: vocab1/text8.vocab
Vocab size: 71291
Words in train file: 16718844
Loading corpus cache from file: corpi/cache/text8.corpus_cache
Load 99.98%
Loaded
Initializing network
==PROF== Connected to process 72615 (/home/tnallen/dev/word2vec_2/w2v)
==ERROR== The application returned an error code (11)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option
-
Make sure that the nv-nsight-cu-cli executable really is the one you intend to run. It looks like you are starting it from $PATH, so you could try using the absolute path to nv-nsight-cu-cli, or alternatively check the output of nv-nsight-cu-cli --version.
[tnallen@voltron matrixMul]$ nv-nsight-cu-cli --version
NVIDIA (R) Nsight Compute Command Line Profiler
Copyright (c) 2012-2019 NVIDIA Corporation
Version 2019.5.0 (Build 27346997)
/usr/local/cuda-10.2/bin/nv-nsight-cu-cli ./matrixMul
[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 72728 (/home/tnallen/build/NVIDIA_CUDA-10.0_Samples/0_Simple/matrixMul/matrixMul)
==ERROR== The application returned an error code (11)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option
-
Try running the tool with elevated privileges (sudo), if not done already.
Same results as below if i login with a proper root env, as well:
sudo /usr/local/cuda-10.2/bin/nv-nsight-cu-cli ./matrixMul
[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 72781 (/home/tnallen/build/NVIDIA_CUDA-10.0_Samples/0_Simple/matrixMul/matrixMul)
==ERROR== The application returned an error code (11)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option
-
Let us know the output of the “locale” command in your shell
locale
LANG=en_US.UTF-8
LC_CTYPE=“en_US.UTF-8”
LC_NUMERIC=“en_US.UTF-8”
LC_TIME=“en_US.UTF-8”
LC_COLLATE=“en_US.UTF-8”
LC_MONETARY=“en_US.UTF-8”
LC_MESSAGES=“en_US.UTF-8”
LC_PAPER=“en_US.UTF-8”
LC_NAME=“en_US.UTF-8”
LC_ADDRESS=“en_US.UTF-8”
LC_TELEPHONE=“en_US.UTF-8”
LC_MEASUREMENT=“en_US.UTF-8”
LC_IDENTIFICATION=“en_US.UTF-8”
LC_ALL=
-
Try collecting only a single metric, i.e. “nv-nsight-cu-cli --metrics device__attribute_display_name ./matrixMul”
nv-nsight-cu-cli --metrics device__attribute_display_name ./matrixMul
[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 73160 (/home/tnallen/build/NVIDIA_CUDA-10.0_Samples/0_Simple/matrixMul/matrixMul)
==ERROR== The application returned an error code (11)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option
dmesg gives me an identical message down to the values for all of these:
[238884.026445] matrixMul[73163]: segfault at 38 ip 00007f52dadf5cc1 sp 00007f52d9cfd970 error 4 in libcuda-injection.so[7f52dac55000+1206000]
[238884.026455] Code: c3 0f 1f 84 00 00 00 00 00 55 53 ba 07 00 00 00 89 fb 40 0f b6 ff 48 83 ec 08 48 8d 05 18 f7 8f 02 8b 35 12 b3 83 02 48 8b 00 50 38 85 c0 74 11 48 8d 2d 01 79 2c 01 0f b7 45 08 66 83 f8 01
Here is some valgrind output I generated on a longshot; could the glibc version mismatch be responsible?:
nv-nsight-cu-cli ./matrixMul
==73999== Memcheck, a memory error detector
==73999== Copyright (C) 2002-2017, and GNU GPL’d, by Julian Seward et al.
==73999== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==73999== Command: /usr/local/cuda-10.2/bin/…/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli ./matrixMul
==73999==
[Matrix Multiply Using CUDA] - Starting…
==PROF== Connected to process 74018 (/home/tnallen/build/NVIDIA_CUDA-10.0_Samples/0_Simple/matrixMul/matrixMul)
==ERROR== The application returned an error code (11)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option
==73999== Invalid read of size 1
==73999== at 0x572D99: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x5CA253: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x5CB459: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x49F2E0: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x433E45: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x42B0CB: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x41E48F: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x4A4E412: (below main) (in /usr/lib64/libc-2.28.so)
==73999== Address 0x4f5c268 is 8 bytes inside a block of size 88 free’d
==73999== at 0x4838A0C: free (vg_replace_malloc.c:540)
==73999== by 0x553DAD: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x554288: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x53212E: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x5321D7: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x530F9A: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x53A8A1: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x53EFF3: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x52E8D1: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x52EBD7: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x52ECF8: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x4DBEB2: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== Block was alloc’d at
==73999== at 0x483780B: malloc (vg_replace_malloc.c:309)
==73999== by 0xBE0787: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x57144E: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x5CDFC8: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x5CF546: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x48C173: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x428F0F: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x42AF95: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x41E48F: ??? (in /usr/local/cuda-10.2/nsight-compute-2019.5.0/target/linux-desktop-glibc_2_11_3-x64/nv-nsight-cu-cli)
==73999== by 0x4A4E412: (below main) (in /usr/lib64/libc-2.28.so)
---------- snip ----------