my service shutdowm with nvidia-docker2

i deploy the k8s in my server, the k8s version is 1.11.5, and the k8s deploy the GPU by the plugin mode. but now my server will dowm sometime .

my service environment:

Containers: 8
Running: 4
Paused: 0
Stopped: 4
Images: 125
Server Version: 18.09.0
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871-dirty
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-693.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 72
Total Memory: 141.4GiB
Name: node1
ID: 4BUK:ODUP:6KLA:ROHU:XLRT:2KQ4:V6LP:U2AM:UBDD:AEIR:FDV6:ZUN7
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
registry:5500
127.0.0.0/8
Registry Mirrors:
https://registry.docker-cn.com/
https://docker.mirrors.ustc.edu.cn/
Live Restore Enabled: false
Product License: Community Engine

the coredump info :
[ 161.704977] INFO: Object 0xffff88236535ff18 @offset=3864
[ 161.705176] INFO: Object 0xffff88236535ff50 @offset=3920
[ 161.705346] INFO: Object 0xffff88236535ff88 @offset=3976
[ 161.705554] INFO: Object 0xffff88236535ffc0 @offset=4032
[ 161.705746] general protection fault: 0000 [#1] SMP
[ 161.705952] Modules linked in: veth vxlan ip6_udp_tunnel udp_tunnel rbd fuse ceph libceph dns_resolver ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables br_netfilter bridge stp llc xt_statistic xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_recent ipt_REJECT nf_reject_ipv4 xt_mark nf_conntrack_netlink xt_comment xt_conntrack ip_set nfnetlink xt_addrtype iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack nvidia_uvm(POE) overlay(T) team_mode_roundrobin team ext4 mbcache jbd2 iptable_filter sunrpc nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) vfat fat sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass iTCO_wdt iTCO_vendor_support crc32_pclmul ghash_clmulni_intel aesni_intel i2c_i801
[ 161.707695] lpc_ich lrw gf128mul glue_helper ablk_helper ses enclosure scsi_transport_sas joydev pcspkr mei_me mei ipmi_ssif cryptd sg ipmi_si ipmi_devintf ipmi_msghandler shpchp acpi_power_meter ip_tables xfs sd_mod crc_t10dif crct10dif_generic ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops bnx2x ttm mxm_wmi ahci drm libahci libata crct10dif_pclmul crct10dif_common crc32c_intel igb aacraid dca i2c_algo_bit mdio i2c_core ptp pps_core libcrc32c wmi dm_mirror dm_region_hash dm_log dm_mod
[ 161.710060] CPU: 52 PID: 52350 Comm: fpsserver-10001 Tainted: P B OE ------------ T 3.10.0-693.el7.x86_64 #1
[ 161.710698] Hardware name: New H3C Technologies Co., Ltd. UIS R390X G2/RS32M2C9S, BIOS 1.01.19P01 03/20/2018
[ 161.711252] task: ffff8822ab6dbf40 ti: ffff882291630000 task.ti: ffff882291630000
[ 161.711786] RIP: 0010:[] [] kmem_cache_close+0xc9/0x2e0
[ 161.712309] RSP: 0000:ffff882291633a68 EFLAGS: 00010202
[ 161.712917] RAX: ffff881234d54101 RBX: ffff88236535fff8 RCX: 000000000000100a
[ 161.713444] RDX: 0000000000001009 RSI: 0000000000000246 RDI: ffff88017fc03d00
[ 161.714043] RBP: ffff882291633ab8 R08: 0000000000019b80 R09: ffffffff811e19c5
[ 161.714680] R10: ffff881237c99b80 R11: ffffea0048d35500 R12: ffff8811d7279600
[ 161.715404] R13: ffffea008d94d7c0 R14: ffff88236535f000 R15: dead0000000000e0
[ 161.716139] FS: 00007f33417fa700(0000) GS:ffff881237c80000(0000) knlGS:0000000000000000
[ 161.716750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 161.717455] CR2: 00007fc1ec185000 CR3: 00000000019f2000 CR4: 00000000003407e0
[ 161.718150] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 161.718795] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 161.719685] Stack:
[ 161.720418] ffff8811d7279608 000000017f7c9100 ffff8823ed676880 ffff881234d54140
[ 161.721170] ffff8823ed676890 ffff8811d7279600 ffff8811d7279600 ffff88247f7c9100
[ 161.721883] ffffc90057a53008 0000000000000000 ffff882291633ad8 ffffffff811e1ad4
[ 161.722558] Call Trace:
[ 161.723271] [] __kmem_cache_shutdown+0x14/0x80
[ 161.724005] [] kmem_cache_destroy+0x44/0xf0
[ 161.724778] [] kmem_cache_destroy_memcg_children+0x89/0xb0
[ 161.725634] [] kmem_cache_destroy+0x19/0xf0
[ 161.726340] [] deinit_chunk_split_cache+0x77/0xa0 [nvidia_uvm]
[ 161.727187] [] uvm_pmm_gpu_deinit+0x50/0x60 [nvidia_uvm]
[ 161.727958] [] remove_gpu+0x275/0x300 [nvidia_uvm]
[ 161.728686] [] uvm_gpu_release_locked+0x21/0x30 [nvidia_uvm]
[ 161.729777] [] uvm_va_space_destroy+0x36c/0x3e0 [nvidia_uvm]
[ 161.730565] [] uvm_release+0x11/0x20 [nvidia_uvm]
[ 161.731406] [] __fput+0xe9/0x260
[ 161.732213] [] ____fput+0xe/0x10
[ 161.733046] [] task_work_run+0xc5/0xf0
[ 161.733891] [] do_exit+0x2d1/0xa40
[ 161.734661] [] ? dequeue_entity+0x11c/0x5d0
[ 161.735610] [] do_group_exit+0x3f/0xa0
[ 161.736367] [] get_signal_to_deliver+0x1ce/0x5e0
[ 161.737225] [] do_signal+0x57/0x6c0
[ 161.738024] [] ? hrtimer_cancel+0x28/0x40
[ 161.738754] [] ? hrtimer_nanosleep+0xbb/0x180
[ 161.739574] [] ? hrtimer_get_res+0x50/0x50
[ 161.740262] [] do_notify_resume+0x5f/0xb0
[ 161.741094] [] int_signal+0x12/0x17
[ 161.741866] Code: f7 75 44 e9 4a 01 00 00 66 2e 0f 1f 84 00 00 00 00 00 e8 3b c1 15 00 48 8b 45 c0 4c 89 ee 4c 89 e7 48 83 68 08 01 e8 67 b1 ff ff <49> 8b 47 20 49 8d 7f 20 48 83 e8 20 48 3b 7d d0 0f 84 11 01 00
[ 161.743358] RIP [] kmem_cache_close+0xc9/0x2e0
[ 161.744035] RSP

my nvidia pluging images: nvidia/k8s-device-plugin:1.11

Hi
Can you provide log by nvidia-bug-report.sh for further analysis.

Have you resolved this issue? if not, Can you provide log by nvidia-bug-report.sh for further analysis.