I have CUDA 10.0.130 installed on Windows 10, GTX 1070 Ti and ran a few sample tests but failed in some of them. The results of each test are not the same every time I rerun it. I really have no idea and very much appreciate your help.
reduction.exe
reduction.exe Starting…
GPU Device 0: “GeForce GTX 1070 Ti” with compute capability 6.1
Using Device 0: GeForce GTX 1070 Ti
Reducing array of type int
16777216 elements
256 threads (max)
64 blocks
reduction.cpp(264) : getLastCudaError() CUDA error : Kernel execution failed : (77) an illegal memory access was encountered.
********************************************** (After adding cuda-memcheck, the execution took very long to complete, the result isn’t consistent, sometimes failed.)
cuda-memcheck reduction.exe
========= CUDA-MEMCHECK
reduction.exe Starting…
GPU Device 0: “GeForce GTX 1070 Ti” with compute capability 6.1
Using Device 0: GeForce GTX 1070 Ti
Reducing array of type int
16777216 elements
256 threads (max)
64 blocks
Reduction, Throughput = 0.0554 GB/s, Time = 1.21219 s, Size = 16777216 Elements, NumDevsUsed = 1, Workgroup = 256
GPU result = 2139095040
CPU result = 2139095040
Test passed
simpleCUBLAS.exe
GPU Device 0: “GeForce GTX 1070 Ti” with compute capability 6.1
simpleCUBLAS test running…
!!! device access error (read C)
********************************************** (2nd time)
simpleCUBLAS.exe
GPU Device 0: “GeForce GTX 1070 Ti” with compute capability 6.1
simpleCUBLAS test running…
simpleCUBLAS test failed.
********************************************** (Adding cuda-memcheck)
cuda-memcheck simpleCUBLAS.exe
========= CUDA-MEMCHECK
GPU Device 0: “GeForce GTX 1070 Ti” with compute capability 6.1
simpleCUBLAS test running…
!!! device access error (read C)
========= Invalid shared read of size 16
========= at 0x00000090 in sgemm_32x32x32_NN
========= by thread (0,0,0) in block (2,2,0)
========= Address 0x00004200 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\WINDOWS\system32\nvcuda.dll (cuMemcpy2DAsync + 0x1b9ff9) [0x1c8735]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x1e37c) [0x45de8c]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x21334) [0x460e44]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasZtrttp + 0x80d4b) [0x39b51b]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x6434) [0x445f44]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasZhpr2_v2 + 0x5672) [0x206862]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasSgemm_v2 + 0x5dd) [0x20788d]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (main + 0x4bd) [0xf26dd]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (invoke_main + 0x34) [0xf37e4]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (__scrt_common_main_seh + 0x127) [0xf3687]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (__scrt_common_main + 0xe) [0xf354e]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (mainCRTStartup + 0x9) [0xf3809]
========= Host Frame:C:\WINDOWS\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x181f4]
========= Host Frame:C:\WINDOWS\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6a251]
========= Invalid shared read of size 0
========= at 0x00000060 in sgemm_32x32x32_NN
========= by thread (113,0,0) in block (1,0,0)
========= Address 0x00004200 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\WINDOWS\system32\nvcuda.dll (cuMemcpy2DAsync + 0x1b9ff9) [0x1c8735]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x1e37c) [0x45de8c]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x21334) [0x460e44]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasZtrttp + 0x80d4b) [0x39b51b]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x6434) [0x445f44]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasZhpr2_v2 + 0x5672) [0x206862]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasSgemm_v2 + 0x5dd) [0x20788d]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (main + 0x4bd) [0xf26dd]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (invoke_main + 0x34) [0xf37e4]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (__scrt_common_main_seh + 0x127) [0xf3687]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (__scrt_common_main + 0xe) [0xf354e]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (mainCRTStartup + 0x9) [0xf3809]
========= Host Frame:C:\WINDOWS\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x181f4]
========= Host Frame:C:\WINDOWS\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6a251]
========= Program hit cudaErrorLaunchFailure (error 4) due to “unspecified launch failure” on CUDA API call to cudaMemcpy.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\WINDOWS\system32\nvcuda.dll (cuMemcpy2DAsync + 0x2fa12f) [0x30886b]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGemmStridedBatchedEx + 0x217a0) [0x4612b0]
========= Host Frame:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cublas64_100.dll (cublasGetVector + 0x224) [0x103444]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (main + 0x558) [0xf2778]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (invoke_main + 0x34) [0xf37e4]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (__scrt_common_main_seh + 0x127) [0xf3687]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (__scrt_common_main + 0xe) [0xf354e]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\simpleCUBLAS.exe (mainCRTStartup + 0x9) [0xf3809]
========= Host Frame:C:\WINDOWS\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x181f4]
========= Host Frame:C:\WINDOWS\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6a251]
========= ERROR SUMMARY: 3 errors
matrixMul.exe
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “GeForce GTX 1070 Ti” with compute capability 6.1
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
CUDA error at D:/Downloads/VatesSetup/cuda-samples-master/Samples/matrixMul/matrixMul.cu:226 code=77(cudaErrorIllegalAddress) “cudaEventSynchronize(stop)”
********************************************** (Adding cuda-memcheck)
cuda-memcheck matrixMul.exe
========= CUDA-MEMCHECK
[Matrix Multiply Using CUDA] - Starting…
GPU Device 0: “GeForce GTX 1070 Ti” with compute capability 6.1
MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel…
done
CUDA error at D:/Downloads/VatesSetup/cuda-samples-master/Samples/matrixMul/matrixMul.cu:226 code=4(cudaErrorLaunchFailure) “cudaEventSynchronize(stop)”
========= Program hit cudaErrorLaunchFailure (error 4) due to “unspecified launch failure” on CUDA API call to cudaEventSynchronize.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\WINDOWS\system32\nvcuda.dll (cuD3D9UnmapVertexBuffer + 0x2e2c85) [0x2f105b]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\matrixMul.exe (cudaEventSynchronize + 0x103) [0xf843]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\matrixMul.exe (MatrixMultiply + 0x638) [0x65798]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\matrixMul.exe (main + 0x283) [0x65e53]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\matrixMul.exe (invoke_main + 0x34) [0x6a374]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\matrixMul.exe (__scrt_common_main_seh + 0x127) [0x6a237]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\matrixMul.exe (__scrt_common_main + 0xe) [0x6a0fe]
========= Host Frame:D:\Downloads\VatesSetup\cuda-samples-master\bin\win64\Debug\matrixMul.exe (mainCRTStartup + 0x9) [0x6a399]
========= Host Frame:C:\WINDOWS\System32\KERNEL32.DLL (BaseThreadInitThunk + 0x14) [0x181f4]
========= Host Frame:C:\WINDOWS\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x6a251]
========= ERROR SUMMARY: 1 error
I have increased WDDM TRD Delay to 10 according to some posts, too. This is the output of nvidia-smi.exe:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 417.35 Driver Version: 417.35 CUDA Version: 10.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107… WDDM | 00000000:01:00.0 On | N/A |
| 41% 40C P8 13W / 180W | 281MiB / 8192MiB | 0% Default |
±------------------------------±---------------------±---------------------+