CUDA version affects inference results during batching

We are running an application in an ubuntu jammy docker image on an NVIDIA Quadro P2200, with driver version 515.65.01 and CUDA 11.7. We are doing classification using Linear regression, and are experiencing instabilites with the results when we do inference depending on the CUDA version we are running.

We perform inference in batching, hence we process 8 lines from the image at a time. The issue only occurs when we perform inference during batching. Deactivating batching fixes the problem, but we are dependent on enabling batching due to processing restrictions. The inference model is an ONNX model loaded with opencv and generated from tensorflow. Running inference in python with onnxruntime yields correct results, ruling out any possibility that the problem comes from the model itself.

The issue shows itself in that the classified images sent through inference do not yield updated results, ie, the annotated pixels are not updated and performed inference on. Instead, the pixels of the resulting image are all annotated to one class(defined as foreign) and are never changed no matter how many images we annotate and send through inference.

We have encountered this issue for a while, and have tested the application on multiple computers with varying CUDA versions. The interesting thing is that most of the computers were able to obtain correct results with CUDA 10.2, and some also for CUDA 11.4. All other versions trigger the inference issue during batching.

We get no warning or error message in our logs, and there are no indications that there is a mismatch between any packages or extensions.

Is there anyone here who has encountered the same issue before - or might have some insight that is helpful to debug this problem?

Hi @helene.minge.olsen ,

Request you to raise the concern on CUDA Platform to get better assistance

Thank you