#after cuda-memtest nvidia-smi -q result GPU 00000000:89:00.0 ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : 0 Total : 0 Aggregate Single Bit Device Memory : 4791 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : 4791 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : 0 Total : 0 Retired Pages Single Bit ECC : 2 Double Bit ECC : 0 Pending Page Blacklist : No GPU 00000000:8A:00.0 ECC Errors Volatile Single Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : 0 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : 0 Total : 0 Aggregate Single Bit Device Memory : 751 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : 751 Double Bit Device Memory : 0 Register File : 0 L1 Cache : 0 L2 Cache : 0 Texture Memory : N/A Texture Shared : N/A CBU : 0 Total : 0 Retired Pages Single Bit ECC : 1 Double Bit ECC : 0 Pending Page Blacklist : No [root@~~~~~~ cuda_memtest-1.2.3]# ./cuda_memtest --stress [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][0]:Running cuda memtest, version 1.2.2 [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][0]:Warning: Getting serial number failed [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][0]:NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.33.01 Wed Nov 13 00:00:22 UTC [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][0]:num_gpus=2 [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][0]:Device name=Tesla V100-SXM2-32GB, global memory size=34089730048 [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][0]:major=7, minor=0 [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][1]:Device name=Tesla V100-SXM2-32GB, global memory size=34089730048 [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][1]:major=7, minor=0 [05/12/2020 11:12:34][00555-1029GQ-SOL-TEST-002][1]:Attached to device 1 successfully. [05/12/2020 11:12:35][00555-1029GQ-SOL-TEST-002][0]:Attached to device 0 successfully. [05/12/2020 11:12:35][00555-1029GQ-SOL-TEST-002][1]:Allocated 32162 MB [05/12/2020 11:12:35][00555-1029GQ-SOL-TEST-002][1]:Test10 [Memory stress test] [05/12/2020 11:12:35][00555-1029GQ-SOL-TEST-002][1]:Test10 with pattern=0xc0f584703429a73 [05/12/2020 11:12:35][00555-1029GQ-SOL-TEST-002][0]:Allocated 32162 MB [05/12/2020 11:12:35][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:12:35][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0xc0f584703429a73 [05/12/2020 11:12:46][00555-1029GQ-SOL-TEST-002][1]:Test10 finished in 11.7 seconds [05/12/2020 11:12:46][00555-1029GQ-SOL-TEST-002][1]:Test10 [Memory stress test] [05/12/2020 11:12:46][00555-1029GQ-SOL-TEST-002][1]:Test10 with pattern=0x445349ae27c250eb [05/12/2020 11:12:46][00555-1029GQ-SOL-TEST-002][0]:Test10 finished in 11.6 seconds [05/12/2020 11:12:46][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:12:46][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0x445349ae27c250eb [05/12/2020 11:12:58][00555-1029GQ-SOL-TEST-002][1]:Test10 finished in 11.7 seconds [05/12/2020 11:12:58][00555-1029GQ-SOL-TEST-002][1]:Test10 [Memory stress test] [05/12/2020 11:12:58][00555-1029GQ-SOL-TEST-002][1]:Test10 with pattern=0x1911c7932ca08f3f [05/12/2020 11:12:58][00555-1029GQ-SOL-TEST-002][0]:Test10 finished in 11.6 seconds [05/12/2020 11:12:58][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:12:58][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0x1911c7932ca08f3f [05/12/2020 11:13:10][00555-1029GQ-SOL-TEST-002][1]:Test10 finished in 11.7 seconds [05/12/2020 11:13:10][00555-1029GQ-SOL-TEST-002][1]:Test10 [Memory stress test] [05/12/2020 11:13:10][00555-1029GQ-SOL-TEST-002][1]:Test10 with pattern=0x6dd04578318ecd93 [05/12/2020 11:13:10][00555-1029GQ-SOL-TEST-002][0]:Test10 finished in 11.6 seconds [05/12/2020 11:13:10][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:13:10][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0x6dd04578318ecd93 [05/12/2020 11:13:21][00555-1029GQ-SOL-TEST-002][0]:Test10 finished in 11.6 seconds [05/12/2020 11:13:21][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:13:21][00555-1029GQ-SOL-TEST-002][1]:Test10 finished in 11.7 seconds [05/12/2020 11:13:21][00555-1029GQ-SOL-TEST-002][1]:Test10 [Memory stress test] [05/12/2020 11:13:21][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0x262436df55fe840b [05/12/2020 11:13:21][00555-1029GQ-SOL-TEST-002][1]:Test10 with pattern=0x262436df55fe840b [05/12/2020 11:13:33][00555-1029GQ-SOL-TEST-002][0]:Test10 finished in 11.6 seconds [05/12/2020 11:13:33][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:13:33][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0x7ae2b0c45adcc25f [05/12/2020 11:13:33][00555-1029GQ-SOL-TEST-002][1]:Test10 finished in 11.7 seconds [05/12/2020 11:13:33][00555-1029GQ-SOL-TEST-002][1]:Test10 [Memory stress test] [05/12/2020 11:13:33][00555-1029GQ-SOL-TEST-002][1]:Test10 with pattern=0x7ae2b0c45adcc25f [05/12/2020 11:13:45][00555-1029GQ-SOL-TEST-002][0]:Test10 finished in 11.6 seconds [05/12/2020 11:13:45][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:13:45][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0x4fa132a95fbb00b3 [05/12/2020 11:13:45][00555-1029GQ-SOL-TEST-002][1]:Test10 finished in 11.7 seconds [05/12/2020 11:13:45][00555-1029GQ-SOL-TEST-002][1]:Test10 [Memory stress test] [05/12/2020 11:13:45][00555-1029GQ-SOL-TEST-002][1]:Test10 with pattern=0x4fa132a95fbb00b3 [05/12/2020 11:13:56][00555-1029GQ-SOL-TEST-002][0]:Test10 finished in 11.6 seconds [05/12/2020 11:13:56][00555-1029GQ-SOL-TEST-002][0]:Test10 [Memory stress test] [05/12/2020 11:13:56][00555-1029GQ-SOL-TEST-002][0]:Test10 with pattern=0x7f52010043abb2b [05/12/2020 11:13:56][00555-1029GQ-SOL-TEST-002][1]:Test10 finished in 11.7 seconds #during cuda-memtest nvidia-smi result +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... On | 00000000:89:00.0 Off | 0 | | N/A 47C P0 122W / 300W | 32480MiB / 32510MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... On | 00000000:8A:00.0 Off | 0 | | N/A 49C P0 126W / 300W | 32480MiB / 32510MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 132213 C ./cuda_memtest 32469MiB | | 1 132213 C ./cuda_memtest 32469MiB | +-----------------------------------------------------------------------------+