The code can be successfully run on K40 GPU, but it failed on Tesla M2090 GPU.
I also tried to use “nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35” to compile it. But still have problems below:
% cuda-memcheck ./Release/Bptrain config.txt ========= CUDA-MEMCHECK Use GPU precision=float type Init DNN weight file loaded: For Test file: Start the programming:DNN Total GPU Device : 2 | Use GPU Device : 0 ========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaFreeHost. ========= Saved host backtrace up to driver entry point at error ========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2f31b3] ========= Host Frame:./Release/DNN [0x7ce26] ========= Host Frame:./Release/DNN [0x327ef] ========= Host Frame:./Release/DNN [0x373d3] ========= Host Frame:./Release/DNN [0x395da] ========= Host Frame:./Release/DNN [0x5bdd] ========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21ec5] ========= Host Frame:./Release/DNN [0x5d0f] ========= Created net with 3 hiddenlayers, 25 parallel, lrate = 0.0050. ========= Error: process didn't terminate successfully ========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors. ========= Internal error (20) ========= No CUDA-MEMCHECK results found