Darknet: segmentation fault: invoke function cudnnBatchNormalizationForwardTraining()

GTC 1080 cuda 9.2 cudnn7.1

nvidia-smi
Thu Jun 28 11:22:22 2018
±----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108… Off | 00000000:17:00.0 Off | N/A |
| 55% 75C P2 296W / 275W | 3871MiB / 11172MiB | 99% Default |
±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 108… Off | 00000000:65:00.0 On | N/A |
| 25% 57C P8 23W / 275W | 689MiB / 11169MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 48131 C ./darknet 3859MiB |
| 1 1161 G /usr/lib/xorg/Xorg 179MiB |
| 1 35254 G /usr/lib/xorg/Xorg 60MiB |
| 1 39659 G /opt/teamviewer/tv_bin/TeamViewer 14MiB |
| 1 64907 G /usr/lib/firefox/firefox 74MiB |
| 1 78908 G /usr/lib/xorg/Xorg 48MiB |
±----------------------------------------------------------------------------+

OPENCV : 3.3

  • GPU=1, OPENCV =1 OK
  • GPU=1 OK
  • but GPU=1, CUDNN=1, OPENCV=1
> ./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74
..........
Loading weights from darknet53.conv.74...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Loaded: 0.793186 seconds
Segmentation fault (core dumped)
> gdb ./darknet ./core 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f8e18a56d0d in cudnnBatchNormalizationForwardTraining () from /usr/local/cuda/lib64/libcudnn.so.7
[Current thread is 1 (Thread 0x7f8e3acc5a40 (LWP 42992))]
(gdb) 
(gdb) where
#0  0x00007f8e18a56d0d in cudnnBatchNormalizationForwardTraining () from /usr/local/cuda/lib64/libcudnn.so.7
#1  0x00000000004b8b4c in forward_batchnorm_layer_gpu (l=..., net=...) at ./src/batchnorm_layer.c:197
#2  0x00000000004cc0e3 in forward_convolutional_layer_gpu (l=..., net=...) at ./src/convolutional_kernels.cu:127
#3  0x0000000000485776 in forward_network_gpu (netp=0x1de4970) at ./src/network.c:778
#4  0x0000000000482afa in forward_network (netp=0x1de4970) at ./src/network.c:192
#5  0x00000000004831e5 in train_network_datum (net=0x1de4970) at ./src/network.c:293
#6  0x00000000004833bf in train_network (net=0x1de4970, d=...) at ./src/network.c:324
#7  0x0000000000432685 in train_detector (datacfg=0x7ffd23a29651 "cfg/voc.data", cfgfile=0x7ffd23a2965e "cfg/yolov3-voc.cfg", 
    weightfile=0x7ffd23a29671 "darknet53.conv.74", gpus=0x7ffd23a27874, ngpus=1, clear=0) at ./examples/detector.c:118
#8  0x000000000043625b in run_detector (argc=6, argv=0x7ffd23a27a68) at ./examples/detector.c:842
#9  0x000000000043cbe9 in main (argc=6, argv=0x7ffd23a27a68) at ./examples/darknet.c:440
(gdb) 
#0  0x00007f8e18a56d0d in cudnnBatchNormalizationForwardTraining () from /usr/local/cuda/lib64/libcudnn.so.7
#1  0x00000000004b8b4c in forward_batchnorm_layer_gpu (l=..., net=...) at ./src/batchnorm_layer.c:197
#2  0x00000000004cc0e3 in forward_convolutional_layer_gpu (l=..., net=...) at ./src/convolutional_kernels.cu:127
#3  0x0000000000485776 in forward_network_gpu (netp=0x1de4970) at ./src/network.c:778
#4  0x0000000000482afa in forward_network (netp=0x1de4970) at ./src/network.c:192
#5  0x00000000004831e5 in train_network_datum (net=0x1de4970) at ./src/network.c:293
#6  0x00000000004833bf in train_network (net=0x1de4970, d=...) at ./src/network.c:324
#7  0x0000000000432685 in train_detector (datacfg=0x7ffd23a29651 "cfg/voc.data", cfgfile=0x7ffd23a2965e "cfg/yolov3-voc.cfg", 
    weightfile=0x7ffd23a29671 "darknet53.conv.74", gpus=0x7ffd23a27874, ngpus=1, clear=0) at ./examples/detector.c:118
#8  0x000000000043625b in run_detector (argc=6, argv=0x7ffd23a27a68) at ./examples/detector.c:842
#9  0x000000000043cbe9 in main (argc=6, argv=0x7ffd23a27a68) at ./examples/darknet.c:440
(gdb) bt
#0  0x00007f8e18a56d0d in cudnnBatchNormalizationForwardTraining ()
   from /usr/local/cuda/lib64/libcudnn.so.7
#1  0x00000000004b8b4c in forward_batchnorm_layer_gpu (l=..., net=...)
    at ./src/batchnorm_layer.c:197
#2  0x00000000004cc0e3 in forward_convolutional_layer_gpu (l=..., 
    net=...) at ./src/convolutional_kernels.cu:127
#3  0x0000000000485776 in forward_network_gpu (netp=0x1de4970)
    at ./src/network.c:778
#4  0x0000000000482afa in forward_network (netp=0x1de4970)
    at ./src/network.c:192
#5  0x00000000004831e5 in train_network_datum (net=0x1de4970)
    at ./src/network.c:293
#6  0x00000000004833bf in train_network (net=0x1de4970, d=...)
    at ./src/network.c:324
#7  0x0000000000432685 in train_detector (
    datacfg=0x7ffd23a29651 "cfg/voc.data", 
    cfgfile=0x7ffd23a2965e "cfg/yolov3-voc.cfg", 
    weightfile=0x7ffd23a29671 "darknet53.conv.74", gpus=0x7ffd23a27874, 
    ngpus=1, clear=0) at ./examples/detector.c:118
#8  0x000000000043625b in run_detector (argc=6, argv=0x7ffd23a27a68)
    at ./examples/detector.c:842
#9  0x000000000043cbe9 in main (argc=6, argv=0x7ffd23a27a68)
    at ./examples/darknet.c:440
(gdb) 
#0  0x00007f8e18a56d0d in cudnnBatchNormalizationForwardTraining ()
   from /usr/local/cuda/lib64/libcudnn.so.7
#1  0x00000000004b8b4c in forward_batchnorm_layer_gpu (l=..., net=...)
    at ./src/batchnorm_layer.c:197
#2  0x00000000004cc0e3 in forward_convolutional_layer_gpu (l=..., 
    net=...) at ./src/convolutional_kernels.cu:127
#3  0x0000000000485776 in forward_network_gpu (netp=0x1de4970)
    at ./src/network.c:778
#4  0x0000000000482afa in forward_network (netp=0x1de4970)
    at ./src/network.c:192
#5  0x00000000004831e5 in train_network_datum (net=0x1de4970)
    at ./src/network.c:293
#6  0x00000000004833bf in train_network (net=0x1de4970, d=...)
    at ./src/network.c:324
#7  0x0000000000432685 in train_detector (
    datacfg=0x7ffd23a29651 "cfg/voc.data", 
    cfgfile=0x7ffd23a2965e "cfg/yolov3-voc.cfg", 
    weightfile=0x7ffd23a29671 "darknet53.conv.74", gpus=0x7ffd23a27874, 
    ngpus=1, clear=0) at ./examples/detector.c:118
#8  0x000000000043625b in run_detector (argc=6, argv=0x7ffd23a27a68)
    at ./examples/detector.c:842
#9  0x000000000043cbe9 in main (argc=6, argv=0x7ffd23a27a68)
    at ./examples/darknet.c:440

the segmentation fault is caused by the function cudnnBatchNormalizationForwardTraining() in cudnn. But i hava no idea to solve it