NPP JPEG encoding broken after updating CUDA from 8.0 to 10.1

We have been using JPEG encoding with the support of NPP in our application successfully for quite some time. Our code for encoding raw RGB data largely follows the jpegNPP sample from the SDK. For the quantization matrices, we use the ones recommended in the annex of the JPEG format description.

Without any changes to our encoding code, after updating CUDA on Windows 8.1 and 10 (64 bit) from 8.0 to 10.1, the JPEGs have horrible visual artifacts. Two images for comparison (made on a GTX 1080, but also occurring on 1080 Ti and Titan RTX),

with CUDA 8.0:

with CUDA 10.1:

The quantization matrices we use are

const Npp8u luminanceMatrix[64] = {
    16, 11, 10, 16, 24, 40, 51, 61,
    12, 12, 14, 19, 26, 58, 60, 55,
    14, 13, 16, 24, 40, 57, 69, 56,
    14, 17, 22, 29, 51, 87, 80, 62,
    18, 22, 37, 56, 68, 109, 103, 77,
    24, 35, 55, 64, 81, 104, 113, 92,
    49, 64, 78, 87, 103, 121, 120, 101,
    72, 92, 95, 98, 112, 100, 103, 99,
};

const Npp8u chrominanceMatrix[64] = {
    17, 18, 24, 47, 99, 99, 99, 99,
    18, 21, 26, 66, 99, 99, 99, 99,
    24, 26, 56, 99, 99, 99, 99, 99,
    47, 66, 99, 99, 99, 99, 99, 99,
    99, 99, 99, 99, 99, 99, 99, 99,
    99, 99, 99, 99, 99, 99, 99, 99,
    99, 99, 99, 99, 99, 99, 99, 99,
    99, 99, 99, 99, 99, 99, 99, 99
};

We use the chrominance matrix for the Cr and Cb channels and the luminance matrix for the Y channel in three separate calls of nppiDCTQuantFwd8x8LS_JPEG_8u16s_C1R_NEW. When changing the matrices to all ones, i.e., no quantization at all, the resulting images look perfect. However, as soon as a single digit > 1 exists in one of the quantization matrices, the artifacts appear in the result. As already stated, this did not happen prior to the CUDA update, so I assume this is not a fault of our implementation.

Before quantization, we apply a zigzag transformation to the matrices:

const Npp8u zigzag[64] = {
    0, 1, 5, 6, 14, 15, 27, 28,
    2, 4, 7, 13, 16, 26, 29, 42,
    3, 8, 12, 17, 25, 30, 41, 43,
    9, 11, 18, 24, 31, 40, 44, 53,
    10, 19, 23, 32, 39, 45, 52, 54,
    20, 22, 33, 38, 46, 51, 55, 60,
    21, 34, 37, 47, 50, 56, 59, 61,
    35, 36, 48, 49, 57, 58, 62, 63
};

for (unsigned int i = 0u; i < 64u; i++)
{
    table[zigzag[i]] = ((chrominance) ? chrominanceMatrix : luminanceMatrix)[i];
}

I tried removing this transformation to see whether quantization matrices are no longer expected in zigzag layout, but this did just alter the artifacts, but not improve the results.

Does anybody know what is going on? What did change in the NPP library that would make an old implementation not compliant with the new version?