Linux error when running batchCublas

I’m experiencing a system hang on Linux when I run the batchCublas test from the Dev Toolkit 8.0.

This is my kernel:
Linux version 2.6.32-504.16.2.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Wed Apr 22 06:48:29 UTC 2015

Driver info:
[root@compute-1-5 log]# modinfo nvidia
filename: /lib/modules/2.6.32-504.16.2.el6.x86_64/kernel/drivers/video/nvidia.ko
alias: char-major-195-*
version: 367.44
supported: external
license: NVIDIA
srcversion: 62594DC43B355E37A82EB4C
alias: pci:v000010DEd00000E00svsdbc04sc80i00*
alias: pci:v000010DEdsvsdbc03sc02i00
alias: pci:v000010DEdsvsdbc03sc00i00
depends: i2c-core
vermagic: 2.6.32-504.16.2.el6.x86_64 SMP mod_unload modversions
parm: NVreg_Mobile:int
parm: NVreg_ResmanDebugLevel:int
parm: NVreg_RmLogonRC:int
parm: NVreg_ModifyDeviceFiles:int
parm: NVreg_DeviceFileUID:int
parm: NVreg_DeviceFileGID:int
parm: NVreg_DeviceFileMode:int
parm: NVreg_UpdateMemoryTypes:int
parm: NVreg_InitializeSystemMemoryAllocations:int
parm: NVreg_UsePageAttributeTable:int
parm: NVreg_MapRegistersEarly:int
parm: NVreg_RegisterForACPIEvents:int
parm: NVreg_CheckPCIConfigSpace:int
parm: NVreg_EnablePCIeGen3:int
parm: NVreg_EnableMSI:int
parm: NVreg_TCEBypassMode:int
parm: NVreg_UseThreadedInterrupts:int
parm: NVreg_MemoryPoolSize:int
parm: NVreg_RegistryDwords:charp
parm: NVreg_RmMsg:charp
parm: NVreg_AssignGpus:charp

This is my /var/log/messages when it fails

Oct 12 17:02:34 compute-1-5 kernel: nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 244
Oct 12 17:27:39 compute-1-5 kernel: INFO: task events/0:131 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: events/0 D 0000000000000000 0 131 2 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff8808747e1bb0 0000000000000046 0000000000000000 ffffffffa0406dfc
Oct 12 17:27:39 compute-1-5 kernel: ffff88083b88f2c0 ffffffffa0406e98 0000014676998e4e 0000000000000000
Oct 12 17:27:39 compute-1-5 kernel: 0000000000000000 000000010010c846 ffff8808747dfad8 ffff8808747e1fd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv011452rm+0xac/0x190 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv011452rm+0x148/0x190 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] ? cpumask_next_and+0x29/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] ? find_busiest_group+0x8fe/0x9e0
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] os_acquire_mutex+0x45/0x50 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] _nv016673rm+0x18/0x30 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv018782rm+0x3d/0x120 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv021412rm+0x8b5/0x9d0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv000815rm+0x225/0xbb0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? os_execute_work_item+0x0/0x80 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? os_execute_work_item+0x50/0x80 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? worker_thread+0x170/0x2a0
Oct 12 17:27:39 compute-1-5 kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 12 17:27:39 compute-1-5 kernel: [] ? worker_thread+0x0/0x2a0
Oct 12 17:27:39 compute-1-5 kernel: [] ? kthread+0x9e/0xc0
Oct 12 17:27:39 compute-1-5 kernel: [] ? child_rip+0xa/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] ? kthread+0x0/0xc0
Oct 12 17:27:39 compute-1-5 kernel: [] ? child_rip+0x0/0x20
Oct 12 17:27:39 compute-1-5 kernel: INFO: task events/8:139 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: events/8 D 0000000000000008 0 139 2 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff88087401dbb0 0000000000000046 0000000000000000 0000000000000000
Oct 12 17:27:39 compute-1-5 kernel: ffff881067f779c0 0000000000000282 000001467bacc859 0000000000000282
Oct 12 17:27:39 compute-1-5 kernel: ffff88087401db30 000000010010c94f ffff8808740125f8 ffff88087401dfd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] ? cpumask_next_and+0x29/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] ? find_busiest_group+0x8fe/0x9e0
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] os_acquire_mutex+0x45/0x50 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] _nv016673rm+0x18/0x30 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv018782rm+0x3d/0x120 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv021412rm+0x8b5/0x9d0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv000815rm+0x225/0xbb0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? os_execute_work_item+0x0/0x80 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? os_execute_work_item+0x50/0x80 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? worker_thread+0x170/0x2a0
Oct 12 17:27:39 compute-1-5 kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 12 17:27:39 compute-1-5 kernel: [] ? worker_thread+0x0/0x2a0
Oct 12 17:27:39 compute-1-5 kernel: [] ? kthread+0x9e/0xc0
Oct 12 17:27:39 compute-1-5 kernel: [] ? child_rip+0xa/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] ? kthread+0x0/0xc0
Oct 12 17:27:39 compute-1-5 kernel: [] ? child_rip+0x0/0x20
Oct 12 17:27:39 compute-1-5 kernel: INFO: task batchCUBLAS:7138 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: batchCUBLAS D 0000000000000018 0 7138 7131 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff881067079948 0000000000000046 ffff881067079898 ffff881073539558
Oct 12 17:27:39 compute-1-5 kernel: ffff88089c535928 ffff88089c5358c0 0000000000000018 0000000000000019
Oct 12 17:27:39 compute-1-5 kernel: ffff8810670798e8 ffffffff8106d534 ffff88106cf1c5f8 ffff881067079fd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] ? enqueue_task_fair+0x64/0x100
Oct 12 17:27:39 compute-1-5 kernel: [] ? check_preempt_curr+0x6d/0x90
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] ? wake_up_process+0x15/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] ? __up+0x2a/0x40
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] find_uuid+0x47/0x90 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] nvidia_dev_put_uuid+0x1e/0x50 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] nvUvmInterfaceUnregisterGpu+0x2c/0x40 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] remove_gpu+0x1f4/0x2c0 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_gpu_release_locked+0x25/0x30 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_va_space_destroy+0x319/0x360 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_release+0x11/0x20 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] __fput+0xf5/0x210
Oct 12 17:27:39 compute-1-5 kernel: [] fput+0x25/0x30
Oct 12 17:27:39 compute-1-5 kernel: [] filp_close+0x5d/0x90
Oct 12 17:27:39 compute-1-5 kernel: [] put_files_struct+0x7f/0xf0
Oct 12 17:27:39 compute-1-5 kernel: [] exit_files+0x53/0x70
Oct 12 17:27:39 compute-1-5 kernel: [] do_exit+0x18d/0x870
Oct 12 17:27:39 compute-1-5 kernel: [] do_group_exit+0x58/0xd0
Oct 12 17:27:39 compute-1-5 kernel: [] get_signal_to_deliver+0x1f6/0x460
Oct 12 17:27:39 compute-1-5 kernel: [] do_signal+0x75/0x800
Oct 12 17:27:39 compute-1-5 kernel: [] ? lru_add_drain_cpu+0x8b/0xc0
Oct 12 17:27:39 compute-1-5 kernel: [] ? sys_futex+0x7b/0x170
Oct 12 17:27:39 compute-1-5 kernel: [] do_notify_resume+0x90/0xc0
Oct 12 17:27:39 compute-1-5 kernel: [] int_signal+0x12/0x17
Oct 12 17:27:39 compute-1-5 kernel: INFO: task batchCUBLAS:7140 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: batchCUBLAS D 000000000000000f 0 7140 7139 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff88083aa65b58 0000000000000082 ffff88083aa65be8 0000000000000296
Oct 12 17:27:39 compute-1-5 kernel: ffff88083aa65ad8 0000000000000292 ffff88083aa65ad8 0000000000000292
Oct 12 17:27:39 compute-1-5 kernel: ffffffff8100bb8e ffff88083aa65b58 ffff880864771ad8 ffff88083aa65fd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] ? apic_timer_interrupt+0xe/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] ? mutex_spin_on_owner+0x9f/0xc0
Oct 12 17:27:39 compute-1-5 kernel: [] __mutex_lock_slowpath+0x96/0x210
Oct 12 17:27:39 compute-1-5 kernel: [] mutex_lock+0x2b/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_gpu_release+0x1d/0x40 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_ext_gpu_map_free+0x1b/0x20 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_deferred_free_object_list+0x8e/0xf0 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] ? uvm_ext_gpu_map_destroy+0xb8/0x110 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_api_unmap_external_allocation+0xf5/0x130 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] uvm_unlocked_ioctl+0x769/0xdd0 [nvidia_uvm]
Oct 12 17:27:39 compute-1-5 kernel: [] ? __pagevec_free+0x42/0x90
Oct 12 17:27:39 compute-1-5 kernel: [] ? __dec_zone_page_state+0x2e/0x30
Oct 12 17:27:39 compute-1-5 kernel: [] ? release_pages+0x21c/0x250
Oct 12 17:27:39 compute-1-5 kernel: [] vfs_ioctl+0x22/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] ? unmap_region+0x110/0x130
Oct 12 17:27:39 compute-1-5 kernel: [] do_vfs_ioctl+0x84/0x580
Oct 12 17:27:39 compute-1-5 kernel: [] ? remove_vma+0x6e/0x90
Oct 12 17:27:39 compute-1-5 kernel: [] ? do_munmap+0x317/0x3b0
Oct 12 17:27:39 compute-1-5 kernel: [] sys_ioctl+0x81/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] system_call_fastpath+0x16/0x1b
Oct 12 17:27:39 compute-1-5 kernel: INFO: task batchCUBLAS:7147 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: batchCUBLAS D 0000000000000012 0 7147 7139 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff8810562c7af8 0000000000000082 0000000000000000 ffff88087fc2a800
Oct 12 17:27:39 compute-1-5 kernel: ffff8810562c7b28 ffffffff81063c23 00000149d9629a47 0000000000000000
Oct 12 17:27:39 compute-1-5 kernel: 0000000000000000 000000010010f661 ffff881071db85f8 ffff8810562c7fd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] ? perf_event_task_sched_out+0x33/0x70
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] ? thread_return+0x4e/0x7d0
Oct 12 17:27:39 compute-1-5 kernel: [] ? __hrtimer_start_range_ns+0x1a3/0x460
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] os_acquire_mutex+0x45/0x50 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] _nv016673rm+0x18/0x30 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv018782rm+0x3d/0x120 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv018830rm+0x55e/0x650 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv000793rm+0x12/0x20 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv003174rm+0x1e54/0x3280 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv000839rm+0x667/0x800 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? rm_ioctl+0x73/0x100 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_ioctl+0x15c/0x500 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? __do_page_fault+0x1f4/0x500
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_frontend_ioctl+0x33/0x40 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_frontend_unlocked_ioctl+0x21/0x30 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? vfs_ioctl+0x22/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] ? do_vfs_ioctl+0x84/0x580
Oct 12 17:27:39 compute-1-5 kernel: [] ? sys_futex+0x7b/0x170
Oct 12 17:27:39 compute-1-5 kernel: [] ? sys_ioctl+0x81/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] ? posix_get_monotonic_raw+0x11/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] ? system_call_fastpath+0x16/0x1b
Oct 12 17:27:39 compute-1-5 kernel: INFO: task batchCUBLAS:7142 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: batchCUBLAS D 0000000000000001 0 7142 7141 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff88083130baf8 0000000000000086 0000000000000000 0000000000000000
Oct 12 17:27:39 compute-1-5 kernel: ffff880000032b48 0000000300000001 0000014681354f97 ffff880000029e18
Oct 12 17:27:39 compute-1-5 kernel: 000000003130bb98 000000010010b9b9 ffff88084b2c5ad8 ffff88083130bfd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] ? __alloc_pages_nodemask+0x113/0x8d0
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] os_acquire_mutex+0x45/0x50 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] _nv016673rm+0x18/0x30 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv018782rm+0x3d/0x120 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv018830rm+0x55e/0x650 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv000793rm+0x12/0x20 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv003174rm+0x1e54/0x3280 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv000839rm+0x667/0x800 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? rm_ioctl+0x73/0x100 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_ioctl+0x15c/0x500 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_frontend_ioctl+0x33/0x40 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_frontend_unlocked_ioctl+0x21/0x30 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? vfs_ioctl+0x22/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] ? do_vfs_ioctl+0x84/0x580
Oct 12 17:27:39 compute-1-5 kernel: [] ? thread_return+0x4e/0x7d0
Oct 12 17:27:39 compute-1-5 kernel: [] ? native_read_tsc+0x6/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] ? sys_ioctl+0x81/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] ? posix_get_monotonic_raw+0x11/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] ? system_call_fastpath+0x16/0x1b
Oct 12 17:27:39 compute-1-5 kernel: INFO: task batchCUBLAS:7148 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: batchCUBLAS D 0000000000000011 0 7148 7141 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff88103f9f1c98 0000000000000086 0000000000000000 ffffffff810a3e9f
Oct 12 17:27:39 compute-1-5 kernel: ffff88103f9f1c90 ffff881071a74040 0000014a026a6db4 0000000000000286
Oct 12 17:27:39 compute-1-5 kernel: ffff88103f9f1d08 000000010010f661 ffff881071a745f8 ffff88103f9f1fd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] ? hrtimer_try_to_cancel+0x3f/0xd0
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] nvidia_ioctl+0x71/0x500 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? __do_page_fault+0x1f4/0x500
Oct 12 17:27:39 compute-1-5 kernel: [] nvidia_frontend_ioctl+0x33/0x40 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] nvidia_frontend_unlocked_ioctl+0x21/0x30 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] vfs_ioctl+0x22/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] do_vfs_ioctl+0x84/0x580
Oct 12 17:27:39 compute-1-5 kernel: [] ? sys_futex+0x7b/0x170
Oct 12 17:27:39 compute-1-5 kernel: [] sys_ioctl+0x81/0xa0
Oct 12 17:27:39 compute-1-5 kernel: [] ? posix_get_monotonic_raw+0x11/0x20
Oct 12 17:27:39 compute-1-5 kernel: [] system_call_fastpath+0x16/0x1b
Oct 12 17:27:39 compute-1-5 kernel: INFO: task batchCUBLAS:7150 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: batchCUBLAS D 0000000000000019 0 7150 7149 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff8810635e7818 0000000000000082 0000000000000000 000200d000000000
Oct 12 17:27:39 compute-1-5 kernel: 0000000000000282 0000000000000010 0000014674bc8e94 ffff880880010dc0
Oct 12 17:27:39 compute-1-5 kernel: ffffffff81aac4c0 000000010010c853 ffff881073539ad8 ffff8810635e7fd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] ? cache_grow+0x217/0x320
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] nv_get_adapter_state+0x24/0xb0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] os_pci_init_handle+0x50/0x120 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] _nv003246rm+0x6f/0xc0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv010981rm+0xbe/0x120 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv010994rm+0x3c8/0x440 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv011305rm+0x37f/0x6b0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv011537rm+0x117/0x290 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv016801rm+0x355/0x4a0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? _nv000832rm+0xe6/0x6a0 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? rm_init_adapter+0x6a/0x100 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nv_open_device+0x1db/0x690 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_open+0x14f/0x300 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? nvidia_frontend_open+0xb6/0x150 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? cdev_get+0x31/0xe0
Oct 12 17:27:39 compute-1-5 kernel: [] ? chrdev_open+0x125/0x230
Oct 12 17:27:39 compute-1-5 kernel: [] ? mntput_no_expire+0x30/0x110
Oct 12 17:27:39 compute-1-5 kernel: [] ? chrdev_open+0x0/0x230
Oct 12 17:27:39 compute-1-5 kernel: [] ? __dentry_open+0x10a/0x360
Oct 12 17:27:39 compute-1-5 kernel: [] ? security_inode_permission+0x1f/0x30
Oct 12 17:27:39 compute-1-5 kernel: [] ? nameidata_to_filp+0x54/0x70
Oct 12 17:27:39 compute-1-5 kernel: [] ? do_filp_open+0x6d0/0xd20
Oct 12 17:27:39 compute-1-5 kernel: [] ? cp_new_stat+0xe4/0x100
Oct 12 17:27:39 compute-1-5 kernel: [] ? strncpy_from_user+0x4a/0x90
Oct 12 17:27:39 compute-1-5 kernel: [] ? alloc_fd+0x92/0x160
Oct 12 17:27:39 compute-1-5 kernel: [] ? do_sys_open+0x67/0x130
Oct 12 17:27:39 compute-1-5 kernel: [] ? sys_open+0x20/0x30
Oct 12 17:27:39 compute-1-5 kernel: [] ? system_call_fastpath+0x16/0x1b
Oct 12 17:27:39 compute-1-5 kernel: INFO: task nvidia-smi:7154 blocked for more than 120 seconds.
Oct 12 17:27:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:27:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:27:39 compute-1-5 kernel: nvidia-smi D 000000000000001a 0 7154 7152 0x00000000
Oct 12 17:27:39 compute-1-5 kernel: ffff8808626e3b48 0000000000000086 ffff8808626e3a98 ffffffff8113b3c1
Oct 12 17:27:39 compute-1-5 kernel: ffff8808626e3ac8 ffffffff811598ed ffff881073bd06b8 ffff8810727a4080
Oct 12 17:27:39 compute-1-5 kernel: ffffea00393c4c20 ffffea00387da148 ffff880872dbdad8 ffff8808626e3fd8
Oct 12 17:27:39 compute-1-5 kernel: Call Trace:
Oct 12 17:27:39 compute-1-5 kernel: [] ? lru_cache_add_lru+0x21/0x40
Oct 12 17:27:39 compute-1-5 kernel: [] ? page_add_new_anon_rmap+0x9d/0xf0
Oct 12 17:27:39 compute-1-5 kernel: [] ? inode_init_always+0x11e/0x1c0
Oct 12 17:27:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:27:39 compute-1-5 kernel: [] ? dput+0x9a/0x150
Oct 12 17:27:39 compute-1-5 kernel: [] ? __d_lookup+0xa7/0x150
Oct 12 17:27:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:27:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:27:39 compute-1-5 kernel: [] nvidia_frontend_open+0x36/0x150 [nvidia]
Oct 12 17:27:39 compute-1-5 kernel: [] ? cdev_get+0x31/0xe0
Oct 12 17:27:39 compute-1-5 kernel: [] chrdev_open+0x125/0x230
Oct 12 17:27:39 compute-1-5 kernel: [] ? mntput_no_expire+0x30/0x110
Oct 12 17:27:39 compute-1-5 kernel: [] ? chrdev_open+0x0/0x230
Oct 12 17:27:39 compute-1-5 kernel: [] __dentry_open+0x10a/0x360
Oct 12 17:27:39 compute-1-5 kernel: [] ? security_inode_permission+0x1f/0x30
Oct 12 17:27:39 compute-1-5 kernel: [] nameidata_to_filp+0x54/0x70
Oct 12 17:27:39 compute-1-5 kernel: [] do_filp_open+0x6d0/0xd20
Oct 12 17:27:39 compute-1-5 kernel: [] ? cp_new_stat+0xe4/0x100
Oct 12 17:27:39 compute-1-5 kernel: [] ? strncpy_from_user+0x4a/0x90
Oct 12 17:27:39 compute-1-5 kernel: [] ? alloc_fd+0x92/0x160
Oct 12 17:27:39 compute-1-5 kernel: [] do_sys_open+0x67/0x130
Oct 12 17:27:39 compute-1-5 kernel: [] sys_open+0x20/0x30
Oct 12 17:27:39 compute-1-5 kernel: [] system_call_fastpath+0x16/0x1b
Oct 12 17:29:39 compute-1-5 kernel: INFO: task events/0:131 blocked for more than 120 seconds.
Oct 12 17:29:39 compute-1-5 kernel: Tainted: P --------------- 2.6.32-504.16.2.el6.x86_64 #1
Oct 12 17:29:39 compute-1-5 kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Oct 12 17:29:39 compute-1-5 kernel: events/0 D 0000000000000000 0 131 2 0x00000000
Oct 12 17:29:39 compute-1-5 kernel: ffff8808747e1bb0 0000000000000046 0000000000000000 ffffffffa0406dfc
Oct 12 17:29:39 compute-1-5 kernel: ffff88083b88f2c0 ffffffffa0406e98 0000014676998e4e 0000000000000000
Oct 12 17:29:39 compute-1-5 kernel: 0000000000000000 000000010010c846 ffff8808747dfad8 ffff8808747e1fd8
Oct 12 17:29:39 compute-1-5 kernel: Call Trace:
Oct 12 17:29:39 compute-1-5 kernel: [] ? _nv011452rm+0xac/0x190 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? _nv011452rm+0x148/0x190 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] schedule_timeout+0x215/0x2e0
Oct 12 17:29:39 compute-1-5 kernel: [] ? cpumask_next_and+0x29/0x50
Oct 12 17:29:39 compute-1-5 kernel: [] ? find_busiest_group+0x8fe/0x9e0
Oct 12 17:29:39 compute-1-5 kernel: [] __down+0x72/0xb0
Oct 12 17:29:39 compute-1-5 kernel: [] down+0x41/0x50
Oct 12 17:29:39 compute-1-5 kernel: [] os_acquire_mutex+0x45/0x50 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] _nv016673rm+0x18/0x30 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? _nv018782rm+0x3d/0x120 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? _nv021412rm+0x8b5/0x9d0 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? _nv000815rm+0x225/0xbb0 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? rm_execute_work_item+0x49/0xc0 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? os_execute_work_item+0x0/0x80 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? os_execute_work_item+0x50/0x80 [nvidia]
Oct 12 17:29:39 compute-1-5 kernel: [] ? worker_thread+0x170/0x2a0
Oct 12 17:29:39 compute-1-5 kernel: [] ? autoremove_wake_function+0x0/0x40
Oct 12 17:29:39 compute-1-5 kernel: [] ? worker_thread+0x0/0x2a0
Oct 12 17:29:39 compute-1-5 kernel: [] ? kthread+0x9e/0xc0
Oct 12 17:29:39 compute-1-5 kernel: [] ? child_rip+0xa/0x20
Oct 12 17:29:39 compute-1-5 kernel: [] ? kthread+0x0/0xc0
Oct 12 17:29:39 compute-1-5 kernel: [] ? child_rip+0x0/0x20

I’ve run the same test with the updated driver (367.57) and got the same error. This is part of a cluster and we experience the same error at random and we have to reboot the node.

Any idea about this error?? Is this a kernel issue?? a driver issue?? a hardware issue??

I can reproduce it just running the batchCublas program from the samples folder.

yes, system hang.