The code can be successfully run on K40 GPU, but it failed on Tesla M2090 GPU.
I also tried to use “nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35” to compile it. But still have problems below:
% cuda-memcheck ./Release/Bptrain config.txt
========= CUDA-MEMCHECK
Use GPU precision=float type
Init DNN weight file loaded:
For Test file:
Start the programming:DNN
Total GPU Device : 2 | Use GPU Device : 0
========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaFreeHost.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2f31b3]
========= Host Frame:./Release/DNN [0x7ce26]
========= Host Frame:./Release/DNN [0x327ef]
========= Host Frame:./Release/DNN [0x373d3]
========= Host Frame:./Release/DNN [0x395da]
========= Host Frame:./Release/DNN [0x5bdd]
========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21ec5]
========= Host Frame:./Release/DNN [0x5d0f]
=========
Created net with 3 hiddenlayers, 25 parallel, lrate = 0.0050.
========= Error: process didn't terminate successfully
========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)
========= No CUDA-MEMCHECK results found