I have code running without any problems on M2200 using CUDA 9.0 and CUDA 10.
Using same code on GeForce RTX 2080 Ti gives error when I try to initialize GPU
cudaEventCreate(&stop); returns error 46 :
cudaErrorDevicesUnavailable = 46,
/**
* This indicates that the device kernel image is invalid.
*/
On GPUs, there is no binary compatibility between architectures. This means you cannot execute binary code generated for one architecture on a device of a different architecture: the device kernel image is invalid.
You need to build you code for the correct architecture, so for RTX 2080 Ti compile with -arch=sm_75. If your code needs to execute on devices of more than one architecture, you need to build a “fat binary”, meaning that your program contains kernel images for more than one architecture. See CUDA documentation for details.
Thank you I will try to rebuild and recompile my project again.
At the same time I did not transfer application from one machine to another. But build and compiled my application on GeForce RT 2080 Ti, adding to it source code files created earlier. At the same time, when I moved this project without any changes back to M2200 - it worked without any issues.
And maybe you are correct ; I have to make sure I did not use any *.h files which are not compatible with new device
A CUDA binary can contain binary images for specific GPU architectures, as well as multiple instances of PTX targeting virtual architectures. PTX code can be JIT compiled at application run time into machine code for the architecture of the GPU present. I have naturally no knowledge what kind of object files are embedded into the CUDA image generated by your build, I am just explaining the general principle.
A typical fat binary contains one binary object for each GPU architecture that the application is designed to support, as well as a PTX object for the latest architecture among these. So in your case, you would want to build a fat binary with binary objects for sm_52 (for the Quadro M2200) and sm_75 (for the RTX 2080) as well as a PTX object for compute_75. Look at the documentation of the -code switch of nvcc.
Beyond double-checking the build process, make sure you have a sufficiently recent driver package installed that includes support for Turing-family GPUs.
Thank you again, you understood correctly, that I’m a beginner and are trying to explain basics to me.
I will try to rebuild my project again on a new machine. I have most recent CUDA 10 dev tookit and VS 2017. Previous versions do not support Turing architecture
My problem also is that code is running fine when I’m working with smaller matrix, but error when I get to the bigger one on a second run(first one runs no problem). Which indicates some memory problems, more that kernel image issues.
What can you say about this error?
/**
* This indicates that all CUDA devices are busy or unavailable at the current
* time. Devices are often busy/unavailable due to use of
* ::cudaComputeModeExclusive, ::cudaComputeModeProhibited or when long
* running CUDA kernels have filled up the GPU and are blocking new work
* from starting. They can also be unavailable due to memory constraints
* on a device that already has active CUDA work being performed.
*/
I am confident that you will be able to track down the root cause behind these issues with a bit of good old-fashioned debugging work. Use proper CUDA error checking. Use cuda-memcheck to help you find issues in the code. Acquaint yourself with the CUDA debugger.