Binary compatibility between similar NVIDIA hardware

geohei · October 10, 2021, 10:28am

Hi.

I have a code, which runs fine on my GeForce RTX 2080 SUPER, also fine on a friend’s GeForce RTX 2070S OC 8G, but it fails (= crashes upon start) on GeForce RTX 2080.

GeForce RTX 2080 SUPER and GeForce RTX 2080 are very similar. So I looked into the differences of both. There’s nothing which I could possibly detect as reason why it runs on the former and crashes on the latter.

E.g. GeForce 20 series - Wiki and more …

Is there a common mistake I might have made, or an well known overlook which stroke?

njuffa · October 10, 2021, 11:38am

“Fail” or “crash” is too vague a description. Before getting into the details of what actually happens, please confirm that the application in question performs proper CUDA error checking: It checks the return status of every API call to CUDA or a CUDA-associated library, and every kernel call.

Does “crash” mean that one of these checks did not return cudaSuccess (or the equivalent for a CUDA-associated library)? If so, what was the API called and what was the return status? If the “crash” is not a failing status check, what is the exact nature of it? A segmentation fault?

A common reason for an abnormal program termination are bugs in one’s code, among which there are uninitialized data, out-of-bounds accesses to allocated data, race conditions, and a lack of check for error conditions near their point of origin (for example, failing to check that a dynamic memory allocation was successful). Have you used appropriate tools (e.g. valgrind) to look for those? When you run the app under control of cuda-memcheck does it report any issues?

geohei · October 10, 2021, 5:53pm

I use CUDA error checking for the API calls (all cudaSuccess), but not for the kernel call. How do you check the kernel calls?

By “crash” (sorry for the vague phrasing), I mean that the binary just stops/returns seconds after being started, and without any message, just like (e.g.) if the threads in a block were too high.

cuda-memcheck returns no error either.

However valgrind returned the following:

$ valgrind --leak-check=yes ./etsi
...
==96537== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==96537==    This could cause spurious value errors to appear.
==96537==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==96537== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==96537==    This could cause spurious value errors to appear.
==96537==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
...

I found that cudaMallocManaged() was the reason ?!

int *hitsn;
HANDLE_ERROR( cudaMallocManaged((void **) &hitsn, sizeof(int)) );

Considering that the hardware is virtually identical, I’m running out of ideas.

njuffa · October 10, 2021, 7:39pm

An invalid configuration for a kernel launch does not generally result in program termination. The issue of abnormal program termination may be unrelated to the GPU, considering that cuda-memcheck reports no issues (you ran the app with cuda-memcheck on the system with the RTX 2080, correct?).

I am afraid we won’t make progress in remote failure diagnosis without a minimal reproducer. A good guess for a difference between substantially identical systems is usually that a different amount of available memory causes a memory allocation to fail, because that’s not just a function of system configuration but also of current system state. If allocation failures are not handled gracefully, that could lead to sudden program termination.

You might also want to look at the kernel execution time (either by using the profiler or timing it yourself). If it is near the operating system’s watchdog timer limit for a GPU connected to a display (usually around 2 seconds), the kernel might timeout on slower GPUs, causing the CUDA context to be torn down, which causes all subsequent CUDA API calls to fail.

A simplistic way to check kernel launches in CUDA code is use of a macro like this:

// Macro to catch CUDA errors in kernel launches
#define CHECK_LAUNCH_ERROR()                                          \
do {                                                                  \
    /* Check synchronous errors, i.e. pre-launch */                   \
    cudaError_t err = cudaGetLastError();                             \
    if (cudaSuccess != err) {                                         \
        fprintf (stderr, "Cuda error in file '%s' in line %i : %s.\n",\
                 __FILE__, __LINE__, cudaGetErrorString(err) );       \
        exit(EXIT_FAILURE);                                           \
    }                                                                 \
    /* Check asynchronous errors, i.e. kernel failed (ULF) */         \
    err = cudaThreadSynchronize();                                    \
    if (cudaSuccess != err) {                                         \
        fprintf (stderr, "Cuda error in file '%s' in line %i : %s.\n",\
                 __FILE__, __LINE__, cudaGetErrorString( err) );      \
        exit(EXIT_FAILURE);                                           \
    }                                                                 \
} while (0)

geohei · October 13, 2021, 2:19pm

Side question: The RTX 2080 machine (which fails to run my code) has no CUDA SDK etc. installed. Does the compiled binary require any NVIDIA prerequisites to execute?

Robert_Crovella · October 13, 2021, 2:44pm

It might. It certainly requires a properly installed GPU driver that is sufficiently new to meet the requirements of however the application was compiled (against which CUDA version it was compiled). Beyond that, other requirements might be specific to how the application was built. The usual requirements here would be any expected dynamically linked libraries. However, an application that has a requirement on a dynamically linked library will normally give a fairly explicit message at runtime if it cannot locate that library. So this is probably not the case here. An application might have other requirements that have nothing to do with CUDA.

Getting out a host code debugger and tracing the host code execution up to the point of app exit might be instructive. You could also sprinkle printf or similar in the code and rebuild it, to emulate the behavior of debugging/localization.

geohei · October 13, 2021, 6:58pm

Update … the RTX 2080 machine had (probably - not sure) 466.11 drivers installed. Update to 496.13 was done. After that, the code ran! I’m surprised the drivers were the culprit.

The initial error (when the code didn’t run on that machine) was:
the provided PTX was compiled with an unsupported toolchain. in ...

The code was compiled with 472.12 and CUDA 11.4.100.

Robert_Crovella · October 13, 2021, 7:44pm

very confusing since you previously said:

If you had actually reported this to begin with:

we could have immediately directed you to update your driver.

njuffa · October 13, 2021, 8:46pm

Not sure why this would be surprising since generally in a software stack, higher layers require certain properties of lower layers. When things seem out of whack, a common heuristic for resolving the issue is to install the latest available driver package suitable for one’s hardware.

Topic		Replies	Views
Cuda Error on GeForce RTX 2080 Ti vs no err on Quadro M2200 CUDA Programming and Performance	5	2533	January 18, 2019
Porting from Maxwell (TITAN X) to Pascal (GTX 1080) unspecified kernel launch error CUDA Programming and Performance	6	2332	July 15, 2016
Silent kernel failure CUDA Programming and Performance	25	8687	May 18, 2020
RTX 2080 Super, drivers not working (Ubuntu 20.04 or 22.04) Linux	12	1779	February 29, 2024
K80 crashed or wrong computation results on K80 CUDA Programming and Performance	13	5131	September 20, 2015
Can a Kernel be too big?? CUDA_ERROR_NO_BINARY_FOR_GPU error 209 CUDA Programming and Performance	11	3229	November 13, 2017
unspecified launch failures on GTX580 but GTX480 CUDA Programming and Performance	2	6013	January 11, 2011
CUDA 4 + driver 270.35 (C2050) random errors CUDA Programming and Performance	13	18778	April 7, 2011
CUDA kernel crash on K80 CUDA Programming and Performance	2	778	November 26, 2017
Program work only on one computer, why? CUDA Programming and Performance	11	1120	March 6, 2017

Binary compatibility between similar NVIDIA hardware

Related topics