Mixing architectures with drivers that support CUDA 9.2

I have a computer with a Titan Xp Collector’s Edition and a GTX 970. In the past, our CUDA application worked without error using both GPUs. We recently updated the drivers to the latest and the application no longer works giving odd behavior, generally crashing but sometimes giving us an out of date driver error.

I built another application to just display information on all CUDA capable GPUs in a system and it also errors.

The following is the source:

#include <iostream>

#include <cuda.h>
#include <cuda_runtime.h>

#if _WIN32
#include <Windows.h>
#endif

//--------------------------------------------------------------------------------------------------
/// @brief Pause the application if run within its own console window.
void pause()
{
    bool pause = false;
    #if _WIN32
    HWND consoleWnd = GetConsoleWindow();
    DWORD dwProcessId;
    GetWindowThreadProcessId(consoleWnd, &dwProcessId);
    if (GetCurrentProcessId() == dwProcessId) pause = true;
    #endif

    if (pause)
    {
        std::cout << std::endl << "Press [ENTER] to exit..." << std::flush;
        std::cin.ignore(1);
    }
} // end pause

//--------------------------------------------------------------------------------------------------
/// @brief Macro to check for and report Cuda errors.
/// @param name - Name (or title) of the call being exeucted
/// @param call - The cuda function the macro should execute and check
#define CUDA_CHECK(name, call) \
{ \
    const cudaError_t r = (call); \
    std::cout << name << ": " << cudaGetErrorString(r) << std::endl; \
    if (r != cudaSuccess) \
    { \
        pause(); \
        return 1; \
    } \
} // end CUDA_CHECK

//--------------------------------------------------------------------------------------------------
int main()
{
    // determine the number of CUDA GPUs
    int count = 0;
    CUDA_CHECK("Device count", cudaGetDeviceCount(&count));
    std::cout << count << " CUDA devices available" << std::endl;

    // display stats on each GPU
    cudaDeviceProp prop;
    size_t memory;
    for (int i = 0; i < count; ++i)
    {
        CUDA_CHECK("Device properties", cudaGetDeviceProperties(&prop, i));
        CUDA_CHECK("Set GPU", cudaSetDevice(i));
        CUDA_CHECK("Memory info", cudaMemGetInfo(NULL, &memory));

        int cores = 0;
        switch (prop.major)
        {
        case 1: cores = prop.multiProcessorCount * 8; break;                            // Tesla (not supported starting CUDA 7.0)
        case 2: cores = prop.multiProcessorCount * (prop.minor == 0 ? 32 : 48); break;  // Fermi (not supported starting CUDA 9.2)
        case 3: cores = prop.multiProcessorCount * 192; break;                          // Kepler
        case 5: cores = prop.multiProcessorCount * 128; break;                          // Maxwell
        case 6: cores = prop.multiProcessorCount * (prop.minor == 0 ? 64 : 128); break; // Pascal
        case 7: cores = prop.multiProcessorCount * 64; break;                           // Volta
        }

        std::cout << "GPU #" << i << ": " << prop.name << std::endl
                  << "        Compute capability: " << prop.major << "." << prop.minor << std::endl
                  << "        Multiprocessors:    " << prop.multiProcessorCount << std::endl
                  << "        Cores:              " << cores << std::endl
                  << "        Clock rate:         " << prop.clockRate * 1.0e-6 << " GHz" << std::endl // KHz -> GHz
                  << "        Memory:             " << memory * 1e-9 << " GB" << std::endl
                  << "        Ratio 32 vs. 64:    " << prop.singleToDoublePrecisionPerfRatio << ":1" << std::endl;
    }

    // display driver version and application runtime version
    int driver, runtime;
    CUDA_CHECK("Driver version", cudaDriverGetVersion(&driver));
    CUDA_CHECK("Runtime version", cudaRuntimeGetVersion(&runtime));
    std::cout << "Driver version:  " << driver <<std::endl
              << "Runtime version: " << runtime << std::endl;

    pause();
    return 0;
}

And the response from the application is:

Device count: no error
2 CUDA devices available
Device properties: no error
Set GPU: no error
Memory info: no error
GPU #0: TITAN Xp COLLECTORS EDITION
        Compute capability: 6.1
        Multiprocessors:    30
        Cores:              3840
        Clock rate:         1.582 GHz
        Memory:             12.8849 GB
        Ratio 32 vs. 64:    32:1
Device properties: no error
Set GPU: no error
Memory info: out of memory

Is this configuration not supported, or is there an error in the installation or configuration? Also, I have tried the Titan Xp Collector’s Edition with a Titan Black and get the same behavior, but when I pair two Titan Xps everything works fine. Thanks for any help

Is this on windows?

I doubt this is the cause of your observations, but I’m pretty sure it’s not correct to pass NULL here:

CUDA_CHECK("Memory info", cudaMemGetInfo(NULL, &memory));

I’m not sure why you think that would be acceptable.

This sort of instability (e.g. odd error codes) may be due to stack corruption in your application. A different driver/runtime library can certainly behave differently in the presence of stack corruption in the calling environment.

Yes, this is on Windows. 7 and 10 to be exact. Compiled with Visual Studio 2015 and linking to CUDA runtime version 8.0.

Typically in C, you can do this when a method takes pointers, it checks for NULL and doesn’t write to that variable, and this application worked exactly as expected on a system with 2 Titan Xps.

That said, I changed that line to pass the address of a variable for the free memory and received the exact same response.

size_t free;
CUDA_CHECK("Memory info", cudaMemGetInfo(&free, &memory));

So, I am inclined to believe that this application is not directly causing a stack corruption. I am fully willing to try any other ideas you may have.

I was able to find the issue. I downgraded the drivers to before CUDA 9.0 was supported and everything works as expected.