I’m working on a Tensorflow application (using an NVidia GPU) under following circumstances:
- OS : Ubuntu 16.04.02 LTS
- GPU: Geforce GTX 1080Ti
- NVidia Driver: 384.59
- CUDA ver.: 8.0.61_375.26
- cuDNN ver.: 5.1
- Tensorflow ver.: 1.2.1
- Application Language: Python
This application is run by CRON, and it sometimes stops processing (about 5 times a month, for now).
A process named “[irq/125-nvidia]” fully used one cpu core when this issue happened, and I found following messages in /var/log/kern.log.
Does anyone know how to deal with this problem? Or would I rather ask Tensorflow team?
thanks.
Aug 6 11:44:07 hostname kernel: [688298.871282] NVRM: GPU at PCI:0000:01:00: GPU-628e8113-4b1e-ddf8-259c-b9e2e7653f7b
Aug 6 11:44:07 hostname kernel: [688298.871289] NVRM: GPU Board Serial Number:
Aug 6 11:44:07 hostname kernel: [688298.871295] NVRM: Xid (PCI:0000:01:00): 44, Ch 00000001, engmask 00000101, intr 10000000
Aug 6 11:47:33 hostname kernel: [688504.868263] INFO: task kworker/4:2:11603 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.868270] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.868273] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.868276] kworker/4:2 D ffff880403c27b78 0 11603 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.868616] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.868621] ffff880403c27b78 ffff8803b60a4008 ffff8800ad79bb00 ffff8803ff648000
Aug 6 11:47:33 hostname kernel: [688504.868627] ffff880403c28000 ffff8800bffe0e28 ffff8803ff648000 ffff880403c27e18
Aug 6 11:47:33 hostname kernel: [688504.868632] ffff8803fe8cf788 ffff880403c27b90 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.868637] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.868650] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.868656] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.868662] [<ffffffff810b4ec3>] ? update_curr+0xe3/0x160
Aug 6 11:47:33 hostname kernel: [688504.868670] [<ffffffff810b27bc>] ? __enqueue_entity+0x6c/0x70
Aug 6 11:47:33 hostname kernel: [688504.868675] [<ffffffff810b9597>] ? put_prev_entity+0x97/0x7d0
Aug 6 11:47:33 hostname kernel: [688504.868680] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.868685] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.868992] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.869272] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.869760] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.870472] [<ffffffffc0a95428>] ? _nv006986rm+0x38/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.870911] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.871347] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.871632] [<ffffffffc0666101>] ? os_free_mutex+0x1/0x20 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.871912] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.871921] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.871927] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.871933] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.871938] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.871943] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.871949] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.871953] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.871959] INFO: task kworker/4:3:11619 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.871963] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.871966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.871969] kworker/4:3 D ffff88009b61fb78 0 11619 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.872252] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.872300] ffff88009b61fb78 0000000000000000 ffff8803ff648000 ffff880392b10ec0
Aug 6 11:47:33 hostname kernel: [688504.872309] ffff88009b620000 ffff8800bffe0e28 ffff880392b10ec0 ffff88009b61fe18
Aug 6 11:47:33 hostname kernel: [688504.872317] ffff8803fe8cf748 ffff88009b61fb90 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.872331] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.872349] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.872366] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.872377] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.872388] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.872640] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.872898] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.873368] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.874053] [<ffffffffc0a95428>] ? _nv006986rm+0x38/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.874485] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.874917] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.875178] [<ffffffffc0666100>] ? os_free_mem+0x30/0x30 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.875414] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.875422] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.875428] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.875433] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.875439] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.875445] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.875450] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.875456] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.875460] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.875469] INFO: task python:12954 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.875473] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.875475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.875477] python D ffff8803fe7b3928 0 12954 12953 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.875483] ffff8803fe7b3928 ffffffffc0bd530c ffff88042470bb00 ffff880421a4ac40
Aug 6 11:47:33 hostname kernel: [688504.875491] ffff8803fe7b4000 ffff8800bffe0e28 ffff880421a4ac40 ffff8803fe7b3bd0
Aug 6 11:47:33 hostname kernel: [688504.875497] ffff880403044008 ffff8803fe7b3940 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.875502] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.875994] [<ffffffffc0bd530c>] ? _nv030836rm+0xc/0x20 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.876004] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.876009] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.876496] [<ffffffffc0c06998>] ? _nv020294rm+0x8/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.876976] [<ffffffffc0c0737d>] ? _nv020343rm+0xd/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.877458] [<ffffffffc0c074f4>] ? _nv020376rm+0x34/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.877928] [<ffffffffc0c32f59>] ? _nv006261rm+0x109/0x240 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.877945] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.877956] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.878198] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.878456] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.878919] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.879340] [<ffffffffc0cb3a30>] ? rm_get_gpu_uuid_raw+0x70/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.879349] [<ffffffff810ac101>] ? try_to_wake_up+0x361/0x3b0
Aug 6 11:47:33 hostname kernel: [688504.879593] [<ffffffffc065b999>] ? nv_open_device+0x579/0x700 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.879839] [<ffffffffc065be8d>] ? nvidia_open+0x14d/0x2f0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.880084] [<ffffffffc065a328>] ? nvidia_frontend_open+0x58/0xa0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.880093] [<ffffffff8121218f>] ? chrdev_open+0xbf/0x1b0
Aug 6 11:47:33 hostname kernel: [688504.880109] [<ffffffff8120b2ef>] ? do_dentry_open+0x1ff/0x310
Aug 6 11:47:33 hostname kernel: [688504.880114] [<ffffffff812120d0>] ? cdev_put+0x30/0x30
Aug 6 11:47:33 hostname kernel: [688504.880120] [<ffffffff8120c484>] ? vfs_open+0x54/0x80
Aug 6 11:47:33 hostname kernel: [688504.880127] [<ffffffff812180eb>] ? may_open+0x5b/0xf0
Aug 6 11:47:33 hostname kernel: [688504.880135] [<ffffffff8121bc97>] ? path_openat+0x1b7/0x1330
Aug 6 11:47:33 hostname kernel: [688504.880141] [<ffffffff8121cf84>] ? putname+0x54/0x60
Aug 6 11:47:33 hostname kernel: [688504.880149] [<ffffffff8121e001>] ? do_filp_open+0x91/0x100
Aug 6 11:47:33 hostname kernel: [688504.880157] [<ffffffff8122b8c6>] ? __alloc_fd+0x46/0x190
Aug 6 11:47:33 hostname kernel: [688504.880162] [<ffffffff8120c858>] ? do_sys_open+0x138/0x2a0
Aug 6 11:47:33 hostname kernel: [688504.880168] [<ffffffff8120c9de>] ? SyS_open+0x1e/0x20
Aug 6 11:47:33 hostname kernel: [688504.880174] [<ffffffff818318b2>] ? entry_SYSCALL_64_fastpath+0x16/0x71
Aug 6 11:47:33 hostname kernel: [688504.880182] INFO: task python:12956 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.880209] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.880220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.880228] python D ffff8803c793bad8 0 12956 12955 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.880249] ffff8803c793bad8 00000000022152c0 ffffffff81e11500 ffff880421f249c0
Aug 6 11:47:33 hostname kernel: [688504.880266] ffff8803c793c000 ffffffffc11740c0 ffff880421f249c0 ffff880423755b00
Aug 6 11:47:33 hostname kernel: [688504.880271] ffffffff821cd240 ffff8803c793baf0 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.880275] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.880281] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.880286] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.880291] [<ffffffff81225557>] ? __d_instantiate+0x97/0xf0
Aug 6 11:47:33 hostname kernel: [688504.880295] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.880299] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.880495] [<ffffffffc065a2f5>] nvidia_frontend_open+0x25/0xa0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.880511] [<ffffffff8121218f>] chrdev_open+0xbf/0x1b0
Aug 6 11:47:33 hostname kernel: [688504.880520] [<ffffffff8120b2ef>] do_dentry_open+0x1ff/0x310
Aug 6 11:47:33 hostname kernel: [688504.880533] [<ffffffff812120d0>] ? cdev_put+0x30/0x30
Aug 6 11:47:33 hostname kernel: [688504.880537] [<ffffffff8120c484>] vfs_open+0x54/0x80
Aug 6 11:47:33 hostname kernel: [688504.880542] [<ffffffff812180eb>] ? may_open+0x5b/0xf0
Aug 6 11:47:33 hostname kernel: [688504.880548] [<ffffffff8121bc97>] path_openat+0x1b7/0x1330
Aug 6 11:47:33 hostname kernel: [688504.880554] [<ffffffff8121cf84>] ? putname+0x54/0x60
Aug 6 11:47:33 hostname kernel: [688504.880560] [<ffffffff8121e001>] do_filp_open+0x91/0x100
Aug 6 11:47:33 hostname kernel: [688504.880564] [<ffffffff8122b8c6>] ? __alloc_fd+0x46/0x190
Aug 6 11:47:33 hostname kernel: [688504.880569] [<ffffffff8120c858>] do_sys_open+0x138/0x2a0
Aug 6 11:47:33 hostname kernel: [688504.880575] [<ffffffff8120c9de>] SyS_open+0x1e/0x20
Aug 6 11:47:33 hostname kernel: [688504.880584] [<ffffffff818318b2>] entry_SYSCALL_64_fastpath+0x16/0x71
Aug 6 11:47:33 hostname kernel: [688504.880596] INFO: task kworker/4:0:13016 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.880604] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.880610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.880612] kworker/4:0 D ffff8803b7c5bb88 0 13016 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.880817] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.880826] ffff8803b7c5bb88 ffff8803b7c5bb58 ffff8800ad79c9c0 ffff8800ad79bb00
Aug 6 11:47:33 hostname kernel: [688504.880830] ffff8803b7c5c000 ffff8800bffe0e28 ffff8800ad79bb00 ffff8803b7c5be18
Aug 6 11:47:33 hostname kernel: [688504.880844] ffff8803fe8cf7c8 ffff8803b7c5bba0 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.880848] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.880858] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.880862] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.880869] [<ffffffff8182d116>] ? __schedule+0x3b6/0xa30
Aug 6 11:47:33 hostname kernel: [688504.880873] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.880878] [<ffffffff810f5d00>] ? __getnstimeofday64+0x60/0xd0
Aug 6 11:47:33 hostname kernel: [688504.880882] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.881083] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.881269] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.881637] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.881959] [<ffffffffc0803799>] ? _nv012470rm+0x29/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.882298] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.882637] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.882825] [<ffffffffc0666100>] ? os_free_mem+0x30/0x30 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.883010] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.883018] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.883023] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.883027] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.883031] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.883036] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.883041] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.883044] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.883049] INFO: task kworker/4:1:13017 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.883052] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.883056] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.883058] kworker/4:1 D ffff88040443fb88 0 13017 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.883258] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.883263] ffff88040443fb88 0000000000000001 ffff8800ad79d880 ffff8800ad79c9c0
Aug 6 11:47:33 hostname kernel: [688504.883267] ffff880404440000 ffff8800bffe0e28 ffff8800ad79c9c0 ffff88040443fe18
Aug 6 11:47:33 hostname kernel: [688504.883271] ffff8803fe8cf108 ffff88040443fba0 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.883275] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.883281] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.883287] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.883293] [<ffffffff810b27bc>] ? __enqueue_entity+0x6c/0x70
Aug 6 11:47:33 hostname kernel: [688504.883296] [<ffffffff810b9597>] ? put_prev_entity+0x97/0x7d0
Aug 6 11:47:33 hostname kernel: [688504.883302] [<ffffffff8102d66c>] ? __switch_to+0x1dc/0x5c0
Aug 6 11:47:33 hostname kernel: [688504.883307] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.883312] [<ffffffff810f5d00>] ? __getnstimeofday64+0x60/0xd0
Aug 6 11:47:33 hostname kernel: [688504.883315] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.883514] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.883713] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.884084] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.884405] [<ffffffffc0803799>] ? _nv012470rm+0x29/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.884749] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.885099] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.885295] [<ffffffffc0666100>] ? os_free_mem+0x30/0x30 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.885489] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.885504] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.885514] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.885529] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.885543] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.885551] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.885565] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.885573] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.885584] INFO: task kworker/4:4:13018 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.885592] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.885599] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.885609] kworker/4:4 D ffff880404433b88 0 13018 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.885809] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.885816] ffff880404433b88 0000000000000001 ffff8800ad79e740 ffff8800ad79d880
Aug 6 11:47:33 hostname kernel: [688504.885824] ffff880404434000 ffff8800bffe0e28 ffff8800ad79d880 ffff880404433e18
Aug 6 11:47:33 hostname kernel: [688504.885830] ffff8803fe8cf0c8 ffff880404433ba0 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.885840] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.885846] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.885850] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.885856] [<ffffffff810b27bc>] ? __enqueue_entity+0x6c/0x70
Aug 6 11:47:33 hostname kernel: [688504.885861] [<ffffffff810b9597>] ? put_prev_entity+0x97/0x7d0
Aug 6 11:47:33 hostname kernel: [688504.885865] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.885870] [<ffffffff810f5d00>] ? __getnstimeofday64+0x60/0xd0
Aug 6 11:47:33 hostname kernel: [688504.885873] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.886059] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.886244] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.886613] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.886936] [<ffffffffc0803799>] ? _nv012470rm+0x29/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.887274] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.887612] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.887815] [<ffffffffc0666100>] ? os_free_mem+0x30/0x30 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888002] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888010] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.888014] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.888019] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.888023] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.888028] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.888033] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.888036] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.888040] INFO: task kworker/4:5:13019 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.888042] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.888046] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.888048] kworker/4:5 D ffff8804237f7b88 0 13019 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.888237] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888242] ffff8804237f7b88 ffffffff810b9535 ffff8800ad798000 ffff8800ad79e740
Aug 6 11:47:33 hostname kernel: [688504.888244] ffff8804237f8000 ffff8800bffe0e28 ffff8800ad79e740 ffff8804237f7e18
Aug 6 11:47:33 hostname kernel: [688504.888247] ffff8803fe8cf088 ffff8804237f7ba0 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.888250] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.888252] [<ffffffff810b9535>] ? put_prev_entity+0x35/0x7d0
Aug 6 11:47:33 hostname kernel: [688504.888257] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.888259] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.888262] [<ffffffff8182d116>] ? __schedule+0x3b6/0xa30
Aug 6 11:47:33 hostname kernel: [688504.888265] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.888268] [<ffffffff810f5d00>] ? __getnstimeofday64+0x60/0xd0
Aug 6 11:47:33 hostname kernel: [688504.888271] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.888365] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888431] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888557] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888666] [<ffffffffc0803799>] ? _nv012470rm+0x29/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888781] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888896] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.888966] [<ffffffffc0666100>] ? os_free_mem+0x30/0x30 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889030] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889035] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.889038] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.889042] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.889045] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.889048] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.889049] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.889051] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.889052] INFO: task kworker/4:6:13020 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.889053] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.889054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.889055] kworker/4:6 D ffff88041f4a7b88 0 13020 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.889116] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889121] ffff88041f4a7b88 ffffffff810b9535 ffff8800ad798ec0 ffff8800ad798000
Aug 6 11:47:33 hostname kernel: [688504.889125] ffff88041f4a8000 ffff8800bffe0e28 ffff8800ad798000 ffff88041f4a7e18
Aug 6 11:47:33 hostname kernel: [688504.889126] ffff8803fe8cf048 ffff88041f4a7ba0 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.889127] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.889129] [<ffffffff810b9535>] ? put_prev_entity+0x35/0x7d0
Aug 6 11:47:33 hostname kernel: [688504.889132] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.889136] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.889137] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.889139] [<ffffffff810f5d00>] ? __getnstimeofday64+0x60/0xd0
Aug 6 11:47:33 hostname kernel: [688504.889143] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.889204] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889267] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889390] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889496] [<ffffffffc0803799>] ? _nv012470rm+0x29/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889609] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889722] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889784] [<ffffffffc0666100>] ? os_free_mem+0x30/0x30 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889846] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889848] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.889850] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.889851] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.889852] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.889854] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.889856] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.889857] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.889858] INFO: task kworker/4:7:13021 blocked for more than 120 seconds.
Aug 6 11:47:33 hostname kernel: [688504.889859] Tainted: P OE 4.4.0-45-generic #66-Ubuntu
Aug 6 11:47:33 hostname kernel: [688504.889860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 6 11:47:33 hostname kernel: [688504.889861] kworker/4:7 D ffff88042476fb88 0 13021 2 0x00000000
Aug 6 11:47:33 hostname kernel: [688504.889922] Workqueue: events os_execute_work_item [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.889923] ffff88042476fb88 ffff88042476fb58 ffff880421f25880 ffff8800ad798ec0
Aug 6 11:47:33 hostname kernel: [688504.889925] ffff880424770000 ffff8800bffe0e28 ffff8800ad798ec0 ffff88042476fe18
Aug 6 11:47:33 hostname kernel: [688504.889926] ffff8803fe8cf008 ffff88042476fba0 ffffffff8182d7c5 7fffffffffffffff
Aug 6 11:47:33 hostname kernel: [688504.889928] Call Trace:
Aug 6 11:47:33 hostname kernel: [688504.889930] [<ffffffff8182d7c5>] schedule+0x35/0x80
Aug 6 11:47:33 hostname kernel: [688504.889931] [<ffffffff818308e5>] schedule_timeout+0x1b5/0x270
Aug 6 11:47:33 hostname kernel: [688504.889933] [<ffffffff8182d116>] ? __schedule+0x3b6/0xa30
Aug 6 11:47:33 hostname kernel: [688504.889934] [<ffffffff8182f87f>] __down+0x7f/0xd0
Aug 6 11:47:33 hostname kernel: [688504.889936] [<ffffffff810f5d00>] ? __getnstimeofday64+0x60/0xd0
Aug 6 11:47:33 hostname kernel: [688504.889938] [<ffffffff810ca131>] down+0x41/0x50
Aug 6 11:47:33 hostname kernel: [688504.889999] [<ffffffffc0665d87>] os_acquire_semaphore+0x37/0x40 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890060] [<ffffffffc0665d9e>] os_acquire_mutex+0xe/0x10 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890183] [<ffffffffc0c320cc>] _nv031494rm+0x5c/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890289] [<ffffffffc0803799>] ? _nv012470rm+0x29/0x120 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890402] [<ffffffffc0cad6db>] ? _nv001136rm+0x6b/0xd0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890514] [<ffffffffc0cb1409>] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890581] [<ffffffffc0666100>] ? os_free_mem+0x30/0x30 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890643] [<ffffffffc0666166>] ? os_execute_work_item+0x46/0x70 [nvidia]
Aug 6 11:47:33 hostname kernel: [688504.890646] [<ffffffff8109a3e5>] ? process_one_work+0x165/0x480
Aug 6 11:47:33 hostname kernel: [688504.890647] [<ffffffff8109a74b>] ? worker_thread+0x4b/0x4c0
Aug 6 11:47:33 hostname kernel: [688504.890649] [<ffffffff8109a700>] ? process_one_work+0x480/0x480
Aug 6 11:47:33 hostname kernel: [688504.890650] [<ffffffff810a0928>] ? kthread+0xd8/0xf0
Aug 6 11:47:33 hostname kernel: [688504.890652] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0
Aug 6 11:47:33 hostname kernel: [688504.890653] [<ffffffff81831c4f>] ? ret_from_fork+0x3f/0x70
Aug 6 11:47:33 hostname kernel: [688504.890654] [<ffffffff810a0850>] ? kthread_create_on_node+0x1e0/0x1e0