Corrupted JPEGs

When creating JPEG files using NPP functions, sometimes (not often) images are corrupted, see example below. Same huffman tables and coefficients are used for all images, only input data changes. Input data is not corrupted.

Errors also seems pretty random.

Images are produced in sequence, using 2 threads and 2 corresponding cudaStreams. Init is done only once for each thread/stream, then nppSetStream, nppiDCTQuantFwd8x8LS_JPEG_8u16s_C1R_NEW and nppiEncodeHuffmanScan_JPEG_8u16s_P3R are used for each image. Each resulting JPEG file also have exactly same header section, only encoded part changes.

What can cause this?

PS: this is a screenshot, but I can provide original file and other details/code if necessary.

I found that the problem is caused by premature end of data segment. It seems that sometimes nppiEncodeHuffmanScan_JPEG_8u16s_P3R returns incorrect length.

That’s my code, without error handling, in case someone can see something:

void CudaJpeg::Compress(
	void* pY, int pitchY,	//input Y plane and its pitch 
	void* pU, int pitchU, 	//input U plane and its pitch
	void* pV, int pitchV, 	//input V plane and its pitch
	void* pJpeg, 		//pointer to the result buffer, large enough
	int* pJpegLength 	//returns length of jpeg file written to buffer

    Npp8u *pdScan = (Npp8u *)pJpeg+jpegPreSize;
    Npp32s nScanLength=0;

    Npp8u *apDstImage[3] = {(Npp8u*)pY, (Npp8u*)pU, (Npp8u*)pV};
    Npp32s aDstImageStep[3] = {pitchY, pitchU, pitchV};
    //write precalculated jpeg headers
    cudaMemcpyAsync(pJpeg, pdJpegPre, jpegPreSize, 
                    cudaMemcpyDeviceToDevice, cudaStream);

    //perform DCT & Co
    for (int i = 0; i<3; i++)
                   apDstImage[i], aDstImageStep[i],
	           apdDCT[i], aDCTStep[i],
	           pdQuantizationTables + oFrameHeader.aQuantizationTableSelector[i] * 64,

    //encode and write jpeg data
    nppiEncodeHuffmanScan_JPEG_8u16s_P3R(apdDCT, aDCTStep,
	0, oScanHeader.nSs, oScanHeader.nSe, 
        oScanHeader.nA >> 4, oScanHeader.nA & 0x0f,
        pdScan, &nScanLength,


//copy "end of image" marker
    cudaMemcpyAsync(pdScan+nScanLength, pdJpegPost, 
                    jpegPostSize, cudaMemcpyDeviceToDevice, cudaStream);

    //calculates total size: header + scanLength + end marker
    *pJpegLength = nScanLength+jpegPreSize+jpegPostSize; 


If this issue occurs with the latest CUDA version (5.5), it would be helpful if you could file a bug report, using the bug reporting form linked from the registered developer website. Please attach a self-contained repro app that demonstrates the issue. Thank you for your help, and sorry for the inconvenience.

Yes, it occurs with 5.5 but it’s very random, I only get it once in few hundreds compressed images and not every time. I don’t know how to reproduce it.