Hi guys,I have a problem when i try to run darknet on my GPU server,this is my environment:
GPU: Tesla T4
OS: CentOS 7.6
Kernel: 5.4.241(x86_64)
NV driver 440.118.02
CUDA Toolkit 10.2.89
CUDNN 8.2.2.26-10.2
Docker 20.10.17
Kubernetes 1.23.0
I had to upgrade the kernel to version 5.4.241 in order to use k8s,after that i use yum to install driver, cuda, and cudnn.When I install Driver using yum install, I have to use the following command to ensure that the driver installation uses the correct kernel:
dkms remove nvidia/440.118.02 --all
dkms install nvidia/440.118.02 -k $(uname -r)
Then the installation of CUDA and CUDNN by using yum went smoothly,no errors,but finally i make & run the mnistCUDNN (cudnn v8 sample) test failed with error like this:
Executing: mnistCUDNN
cudnnGetVersion() : 8202 , CUDNN_VERSION from cudnn.h : 8202 (8.2.2)
Host compiler version : GCC 4.8.5
ERROR: cuda failure (unknown error) in error_util.h:91
Aborting…
This is my nvidia bug report:
nvidia-bug-report.log.gz (4.2 MB)
Please help me,thanks!