CPU code can be run on K40 GPU but can not run on Tesla M2090 GPU

The code can be successfully run on K40 GPU, but it failed on Tesla M2090 GPU.

I also tried to use “nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35” to compile it. But still have problems below:

% cuda-memcheck ./Release/Bptrain config.txt
========= CUDA-MEMCHECK
Use GPU precision=float type
Init DNN weight file loaded:
For Test file:
Start the programming:DNN
Total GPU Device : 2 | Use GPU Device : 0
========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaFreeHost. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2f31b3]
=========     Host Frame:./Release/DNN [0x7ce26]
=========     Host Frame:./Release/DNN [0x327ef]
=========     Host Frame:./Release/DNN [0x373d3]
=========     Host Frame:./Release/DNN [0x395da]
=========     Host Frame:./Release/DNN [0x5bdd]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21ec5]
=========     Host Frame:./Release/DNN [0x5d0f]
=========
Created net with 3 hiddenlayers, 25 parallel, lrate = 0.0050.
========= Error: process didn't terminate successfully
=========        The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)
========= No CUDA-MEMCHECK results found

You would want to examine the details of the API call cuda-memcheck complains about:

cudaErrorInvalidValue (error 11) due to “invalid argument” on CUDA API call to cudaFreeHost.

Without access to your code I am not sure how anybody can render assistance here. I am a bit puzzled as to why there would be a failure on cudaFreeHost, but as a generic guess, your code may be using features not available on compute capability 2.x