CPU code can be run on K40 GPU but can not run on Tesla M2090 GPU

zz_zaoshuxia · May 23, 2016, 2:22pm

The code can be successfully run on K40 GPU, but it failed on Tesla M2090 GPU.

I also tried to use “nvcc -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35” to compile it. But still have problems below:

% cuda-memcheck ./Release/Bptrain config.txt
========= CUDA-MEMCHECK
Use GPU precision=float type
Init DNN weight file loaded:
For Test file:
Start the programming:DNN
Total GPU Device : 2 | Use GPU Device : 0
========= Program hit cudaErrorInvalidValue (error 11) due to "invalid argument" on CUDA API call to cudaFreeHost. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2f31b3]
=========     Host Frame:./Release/DNN [0x7ce26]
=========     Host Frame:./Release/DNN [0x327ef]
=========     Host Frame:./Release/DNN [0x373d3]
=========     Host Frame:./Release/DNN [0x395da]
=========     Host Frame:./Release/DNN [0x5bdd]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21ec5]
=========     Host Frame:./Release/DNN [0x5d0f]
=========
Created net with 3 hiddenlayers, 25 parallel, lrate = 0.0050.
========= Error: process didn't terminate successfully
=========        The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)
========= No CUDA-MEMCHECK results found

njuffa · May 23, 2016, 2:54pm

You would want to examine the details of the API call cuda-memcheck complains about:

cudaErrorInvalidValue (error 11) due to “invalid argument” on CUDA API call to cudaFreeHost.

Without access to your code I am not sure how anybody can render assistance here. I am a bit puzzled as to why there would be a failure on cudaFreeHost, but as a generic guess, your code may be using features not available on compute capability 2.x