@TomK@NVIDIA
Tom, I checked the code sample and now the process of cleanup and order of destroy operations is clear to me.
However, I believe that belongs to API documentation, not the code sample. I hope NVIDIA will consider improving the documentation for the next CUDA release. Documentation also does not explain at which points during decode cudaStreamSynchronize() must be called to ensure correct results.
Your code demonstrates several decoding approaches in a single method, so it’s hard to separate what is needed for which decoding method and why.
Based on the API parameters, I am assuming that I need the following cudaStreamSynchronize() calls:
- Before calling nvjpegDecodeJpegTransferToDevice() (rationale: that’s the first nvjpeg call in which I am passing a CUDA stream handle to it)
- After calling nvjpegDecodeJpegTransferToDevice() and before calling nvjpegDecodeJpegDevice()
- After calling nvjpegDecodeJpegDevice() and before passing the result to OpenGL interop
Are there any other places that I missed? Is perhaps the first sync superfluous?
Now about CMYK decoding – it seems broken to me (yes, I enabled CMYK decoding in params).
First, nvjpegJpegStreamGetChromaSubsampling() is returning NVJPEG_CSS_UNKNOWN for CMYK image.
The actual image has this information in it:
*** Marker: APP14 (xFFEE) ***
OFFSET: 0x0008D719
Length = 14
DCTEncodeVersion = 100
APP14Flags0 = 16384
APP14Flags1 = 0
ColorTransform = 2 [YCCK]
*** Marker: SOF0 (Baseline DCT) (xFFC0) ***
OFFSET: 0x0008D7AF
Frame header length = 20
Precision = 8
Number of Lines = 4773
Samples per Line = 6000
Image Size = 6000 x 4773
Raw Image Orientation = Landscape
Number of Img components = 4
Component[1]: ID=0x01, Samp Fac=0x11 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (Y)
Component[2]: ID=0x02, Samp Fac=0x11 (Subsamp 1 x 1), Quant Tbl Sel=0x01 (Cb)
Component[3]: ID=0x03, Samp Fac=0x11 (Subsamp 1 x 1), Quant Tbl Sel=0x01 (Cr)
Component[4]: ID=0x04, Samp Fac=0x11 (Subsamp 1 x 1), Quant Tbl Sel=0x00 (K)
To me that means subsampling is known (i.e. there is no subsampling on any of the components), not unknown.
Moreover colorspace information is not available and it seems NVJPEG assumes that colorspace is always YCbCr. For this particular image colorspace is recorded in APP14 marker as YCCK (Adobe specific CMYK).
Moreover, nvjpegJpegStreamGetFrameDimensions() only initializes Widths[0] and Heights[0] for CMYK image even though there are 4 components (of the same size in this case).
If I request decoding with NVJPEG_OUTPUT_BGRI the decoded result is wrong (and I don’t mean colors differ a little bit because ICC profile conversion was not used).
If I request decoding with NVJPEG_OUTPUT_UNCHANGED I must use four separate buffers. Why there is no support to return unchanged components as interleaved?
Finally, in the unchanged mode, the only channel that is passed correctly without transformation is K, the other channels are all different from the original.