Issues with 16b or 32b images with the superres or upscale tools with MAXINE

hansonclint · June 18, 2021, 9:18am

I have been attempting to get 16b images to work with the current Maxine SDK. Starting point I used was the VideoEffectsApp example code that was given with the SDK.

I am able to compile and run the example, upresing 8 bit images just fine for png, jpg, etc formats. And i must say the results are great for those, as they were with the NGK examples.
However, for our needs we need higher bit depth than the 8b which the app currently seems limited to. When I tried to upres 16b or 32b images, it was not working and often crashed with errors such as…
Error: invalid pitch argument
or
Error: The specified pixel format is not accommodated

First thing did was make changes to the app to allow it to include tifs and exr’s images formats, as they are the common image formats to use for 32b and 16b images and both supported by the opencv libraries. Specifically float and half float 3 or 4 channel images respectively.

I also modified the cv::imread as follows…

_srcImg = cv::imread(inFile, cv::IMREAD_ANYCOLOR | cv::IMREAD_ANYDEPTH);

Otherwise the opencv libraries will convert 16b images to 8b, etc. I confirmed that at this point the images are coming in correctly(ie as current 32b and 16b image formats) and that all the format were setup correctly, and if I display those images they are displaying fine after being loaded into the app.

However when the code then gets into the NvCVImage* routines to setup the images, it gets into trouble. Immediately giving the errors I listed above.

From what can see I am pretty certain the code in the routines alike NvCVImageTransfer, and NvCVImage_Alloc are not handling 32b or 16b formats correctly, despite the documentation saying it could handle them.

First thing I noticed was the pitch seems to be incorrect after the allocation routines in the setup. The pitch for the resulting NvCVImage seems to represent the number of pixels across for a col, not the byte stride/pitch count as the docs say it should.
Also it appears to me that despite the format when the images are coming in as 8bit, and the final format supposedly being NVCV_F32 and NVCV_BGR. The algorithms are instead treating the 3/4 components of an 8u image as a single floating point number it is then building the model off of. I mean, it is a bit of a black box after that point for me, so this is just based on things like the incorrect pitch being set, and the fact that it does indeed seem like it can’t handle true floating point 3 or 4 channel images.

I am at a bit of a loss at this point how to debug any further. It is really a shame as we had planned to incorporate these into our pipelines heavily.

I really would love to see how 16b or 32b are suppose to work if indeed they are supported as the documentation suggests. As far as I can tell even the 8b is not actually building models correctly, and just might be a bit of a fluke it is working at all, despite 3/4 8u pixel components being treated as a single float by the model instead of say 3/4 floats scaled to 1/255.0. etc. Which could mean the model could work better then it is if that is truly what is happening…

Thanks for any help. Definitely appreciate the help. Sorry for the long message and the amount of detail, but I honestly believe there is something broken in the current setup for MAXINE, and the only way I could think of conveying that is to be detailed in what I found. So apologize the long post.

Clint Hanson
Vancouver, BC

tvarshney · June 22, 2021, 2:11pm

Hi @hansonclint, firstly, thank you for the comprehensive explanation. Could you share the media files that you used to conduct the test? This would help us comment on the specifics.

hansonclint · June 27, 2021, 7:51am

Sorry i was away on vacation. yes off course.

Where do i send them???

Andrey1984 · June 27, 2021, 8:52am

https://storage.googleapis.com/video-001/newtest_50p_1080p_32b.exr

@tvarshney these are the files.

hansonclint · June 28, 2021, 11:16am

Thanks for posting those Andrey. I hopefully just sent tvarshney the images and code changes directly as well.

Thanks for taking a look also Andrey.

If there is anything else I can provide to help with testing please let me know…

hansonclint · July 7, 2021, 3:56am

He just wanted to make sure there isn’t anything else I could do to help from my end?

If there are any more test images you could use, in any formats, etc or any other test setups I could try on the SDK routines please let me know.

Thanks,
Clint

tvarshney · July 7, 2021, 4:38pm

Hi @hansonclint Edit: At the moment, there aren’t any tweaks that could be made to the application to support the 16b and 32b use cases.

hansonclint · July 7, 2021, 6:48pm

I don’t understand 24b? When people talking about image formats, it is 8 bits, 16bit, or 32bits per color? Are you saying it only supports 8bit 3channel images???

hansonclint · July 7, 2021, 6:49pm

Also in the documentation itself its says it support 32b RGB images, as well as 8bit and 16bit images???

hansonclint · July 7, 2021, 6:57pm

//! Transfer one image to another, with a limited set of conversions.
//!
//! If any of the images resides on the GPU, it may run asynchronously,
//! so cudaStreamSynchronize() should be called if it is necessary to run synchronously.
//! The following table indicates (with X) the currently-implemented conversions:
//!    +-------------------+-------------+-------------+-------------+-------------+
//!    |                   |  u8 --> u8  |  u8 --> f32 | f32 --> u8  | f32 --> f32 |
//!    +-------------------+-------------+-------------+-------------+-------------+
//!    | Y      --> Y      |      X      |             |      X      |      X      |
//!    | Y      --> A      |      X      |             |      X      |      X      |
//!    | Y      --> RGB    |      X      |      X      |      X      |      X      |
//!    | Y      --> RGBA   |      X      |      X      |      X      |      X      |
//!    | A      --> Y      |      X      |             |      X      |      X      |
//!    | A      --> A      |      X      |             |      X      |      X      |
//!    | A      --> RGB    |      X      |      X      |      X      |      X      |
//!    | A      --> RGBA   |      X      |             |             |             |
//!    | RGB    --> Y      |      X      |      X      |             |             |
//!    | RGB    --> A      |      X      |      X      |             |             |
//!    | RGB    --> RGB    |      X      |      X      |      X      |      X      |
//!    | RGB    --> RGBA   |      X      |      X      |      X      |      X      |
//!    | RGBA   --> Y      |      X      |      X      |             |             |
//!    | RGBA   --> A      |             |      X      |             |             |
//!    | RGBA   --> RGB    |      X      |      X      |      X      |      X      |
//!    | RGBA   --> RGBA   |      X      |      X      |      X      |      X      |
//!    | RGB    --> YUV420 |      X      |             |      X      |             |
//!    | RGBA   --> YUV420 |      X      |             |      X      |             |
//!    | RGB    --> YUV422 |      X      |             |      X      |             |
//!    | RGBA   --> YUV422 |      X      |             |      X      |             |
//!    | RGB    --> YUV444 |      X      |             |      X      |             |
//!    | RGBA   --> YUV444 |      X      |             |      X      |             |
//!    | YUV420 --> RGB    |      X      |      X      |             |             |
//!    | YUV420 --> RGBA   |      X      |      X      |             |             |
//!    | YUV422 --> RGB    |      X      |      X      |             |             |
//!    | YUV422 --> RGBA   |      X      |      X      |             |             |
//!    | YUV444 --> RGB    |      X      |      X      |             |             |
//!    | YUV444 --> RGBA   |      X      |      X      |             |             |
//!    +-------------------+-------------+-------------+-------------+-------------+
//! where
//! * Either source or destination can be CHUNKY or PLANAR.
//! * Either source or destination can reside on the CPU or the GPU.
//! * The RGB components are in any order (i.e. RGB or BGR; RGBA or BGRA).
//! * For RGBA (or BGRA) destinations, most implementations do not change the alpha channel, so it is recommended to
//!   set it at initialization time with [cuda]memset(im.pixels, -1, im.pitch * im.height) or
//!   [cuda]memset(im.pixels, -1, im.pitch * im.height * im.numComponents) for chunky and planar images respectively.
//! * YUV requires that the colorspace field be set manually prior to Transfer, e.g. typical for layout=NVCV_NV12:
//!   image.colorspace = NVCV_709 | NVCV_VIDEO_RANGE | NVCV_CHROMA_INTSTITIAL;
//! * There are also RGBf16-->RGBf32 and RGBf32-->RGBf16 transfers.
//! * Additionally, when the src and dst formats are the same, all formats are accommodated on CPU and GPU,
//!   and this can be used as a replacement for cudaMemcpy2DAsync() (which it utilizes). This is also true for YUV,
//!   whose src and dst must share the same format, layout and colorspace.
//!
//! When there is some kind of conversion AND the src and dst reside on different processors (CPU, GPU),
//! it is necessary to have a temporary GPU buffer, which is reshaped as needed to match the characteristics
//! of the CPU image. The same temporary image can be used in subsequent calls to NvCVImage_Transfer(),
//! regardless of the shape, format or component type, as it will grow as needed to accommodate
//! the largest memory requirement. The recommended usage for most cases is to supply an empty image
//! as the temporary; if it is not needed, no buffer is allocated. NULL can be supplied as the tmp
//! image, in which case an ephemeral buffer is allocated if needed, with resultant
//! performance degradation for image sequences.
//!
//! \param[in]      src     the source image.
//! \param[out]     dst     the destination image.
//! \param[in]      scale   a scale factor that can be applied when one (but not both) of the images
//!                         is based on floating-point components; this parameter is ignored when all image components
//!                         are represented with integer data types, or all image components are represented with
//!                         floating-point data types.
//! \param[in]      stream  the stream on which to perform the copy. This is ignored if both images reside on the CPU.
//! \param[in,out]  tmp     a temporary buffer that is sometimes needed when transferring images
//!                         between the CPU and GPU in either direction (can be empty or NULL).
//!                         It has the same characteristics as the CPU image, but it resides on the GPU.
//! \return         NVCV_SUCCESS           if successful,
//!                 NVCV_ERR_PIXELFORMAT   if one of the pixel formats is not accommodated.
//!                 NVCV_ERR_CUDA          if a CUDA error has occurred.
//!                 NVCV_ERR_GENERAL       if an otherwise unspecified error has occurred.
NvCV_Status NvCV_API NvCVImage_Transfer(
             const NvCVImage *src, NvCVImage *dst, float scale, struct CUstream_st *stream, NvCVImage *tmp);

Hopefully that formats ok. But you can see that the it lists support for 32b 3 and 4 channel images…
or 128 bits per pixel if you want to look at it that way…

tvarshney · July 7, 2021, 7:09pm

Hi @hansonclint

Apologies for the confusion, editing my original response to avoid mix-ups. I think the 32b reference is to account for the alpha channel. (8 bit per color and 8 bits for the alpha channel)

Links to the documentation: Video Effects SDK Programming Guide :: NVIDIA Maxine Documentation

hansonclint · July 7, 2021, 7:19pm

I am not sure how that could make sense then?

What is an 8b RGBA image from that chart above(From the sdk functions actual declaration) then?
What does and 8b RGBA to 32f RGBA mean then?
How can you then create an 8bit RGBA image according to what you are saying?

Image formats are always declared how many bits per channel, not how many bytes per pixel.
And that is what the declaration in that header file above supports as well…

So for example a RGBA 32B image had 32bits per channel. ie 32bits for R, 32bits for G, and 32bts for B, with and option 32bits for the Alpha(A). So 128bits total for a 32bit RGBA image

Similar for an 8bit image it is 8bits for each of R, G, B, and possible A. For 24bits or 32bits total.

This is also how games look at it…

It doesn’t make sense if that was saying it has support for an 8bit RGBA image if the total is 8bits total. It can’t have 2 bits per channel?

Also this is why there is an f beside the 32f image format, because when you go to 32bits for each channel(for each of RGBA) you usually use floating point representation for that value at that point. Again supported by the declaration for that function.

The charts shows that it supports transfer of 8bit RBGA image to 32bit RGBA. The only possible way that could make sense is the normal understanding of image formats. This is also how opencv sees images as well…?

tvarshney · July 7, 2021, 10:00pm

@hansonclint

Apologies for the confusion Clint. I see the issue here, there are two nomenclatures at play.

You are correct in your understanding about the 8 bit, 16 bit and 32 bit imagery and how the nvcv image (and OpenCV) representations work. NvCVImage_Alloc() and functions similar to it(Realloc, Dealloc, Crate, Destroy) should work just fine for u16, s16, u32, s32, etc. However, NvCVImage_Transfer() only accommodates a subset of u8 and f32 components and formats, which are listed in the table nvCVImage.h around line 360. You can copy u16 and u32 to f32 as:
*pF32 = *pU16 / 65535.0f;
*pF32 = *pU32 / 4294967295.0f;
because Super-Res expects images to be in the normalized BGR f32 planar format.

We typically use u8, u16, u32, f32, etc. to refer to the type and size of the components used in a pixel.

In this context, when someone talks about a 32 bit pixel, they are generally referring to a chunky pixel localized into a compact 32-bit location(rather than the total 128 bit which you might be expecting when you mention 32 bit imagery). We might call that RGBA u8 chunky or BGRA u8 chunky. However, there is also RGBA u8 planar where the pixel is not compact at all, and in fact the R, G, B and A components are separated by megabytes

It seems that you have found some inconsistencies in our documentation in this regard. That’s where the confusion arose. Thank you for bringing this up. We will be sure to address this in our documentation.

That said, while you can attempt at using u16 and u32 images to process through our SDK, we have trained them primarily on u8 imagery so you may or may not get results matching your expectation.

hansonclint · July 7, 2021, 11:16pm

Ok. That’s too bad, but at least I have an answer. Thanks for looking into this. I did try to run it through, i think there is some confusion inside the actually app itself on what is expected to be fed in, etc. So i can’t even get the data into the buffer, etc, as the routines declaration says one thing, but are actually doing what I think you are saying only actually support 8u bits per channel. As when I feed in any opencv data that has 16 or 32bits per channel, the routines like NvCVImage_Transfer error out, or just produce garbage out the other side. It’s a real disappointment for us, I guess i expected based on this post…

https://forums.developer.nvidia.com/t/16-bit-buffer-superresolution-scaling/158750

That the broadcast/MAXINE would support 32bit floating point images.

Do you know does DLSS support a true 32f RGBA image, ie a float for each of R, G, B, A, or even 16bit, as opencv etc understands it.

Or was it trained only on essentially 8-bit images as well?

Andrey1984 · July 11, 2021, 1:40pm

it is also too bad that the broadcast app can not run on any cloud instance of many providers [ Azure, Googl,e Amazon, etc] even with RTX workstations/servers

hansonclint · July 15, 2021, 6:16pm

hansonclint:

//! Transfer one image to another, with a limited set of conversions.
//!
//! If any of the images resides on the GPU, it may run asynchronously,
//! so cudaStreamSynchronize() should be called if it is necessary to run synchronously.
//! The following table indicates (with X) the currently-implemented conversions:
//!    +-------------------+-------------+-------------+-------------+-------------+
//!    |                   |  u8 --> u8  |  u8 --> f32 | f32 --> u8  | f32 --> f32 |
//!    +-------------------+-------------+-------------+-------------+-------------+
//!    | Y      --> Y      |      X      |             |      X      |      X      |
//!    | Y      --> A      |      X      |             |      X      |      X      |
//!    | Y      --> RGB    |      X      |      X      |      X      |      X      |
//!    | Y      --> RGBA   |      X      |      X      |      X      |      X      |
//!    | A      --> Y      |      X      |             |      X      |      X      |
//!    | A      --> A      |      X      |             |      X      |      X      |
//!    | A      --> RGB    |      X      |      X      |      X      |      X      |
//!    | A      --> RGBA   |      X      |             |             |             |
//!    | RGB    --> Y      |      X      |      X      |             |             |
//!    | RGB    --> A      |      X      |      X      |             |             |
//!    | RGB    --> RGB    |      X      |      X      |      X      |      X      |
//!    | RGB    --> RGBA   |      X      |      X      |      X      |      X      |
//!    | RGBA   --> Y      |      X      |      X      |             |             |
//!    | RGBA   --> A      |             |      X      |             |             |
//!    | RGBA   --> RGB    |      X      |      X      |      X      |      X      |
//!    | RGBA   --> RGBA   |      X      |      X      |      X      |      X      |
//!    | RGB    --> YUV420 |      X      |             |      X      |             |
//!    | RGBA   --> YUV420 |      X      |             |      X      |             |
//!    | RGB    --> YUV422 |      X      |             |      X      |             |
//!    | RGBA   --> YUV422 |      X      |             |      X      |             |
//!    | RGB    --> YUV444 |      X      |             |      X      |             |
//!    | RGBA   --> YUV444 |      X      |             |      X      |             |
//!    | YUV420 --> RGB    |      X      |      X      |             |             |
//!    | YUV420 --> RGBA   |      X      |      X      |             |             |
//!    | YUV422 --> RGB    |      X      |      X      |             |             |
//!    | YUV422 --> RGBA   |      X      |      X      |             |             |
//!    | YUV444 --> RGB    |      X      |      X      |             |             |
//!    | YUV444 --> RGBA   |      X      |      X      |             |             |
//!    +-------------------+-------------+-------------+-------------+-------------+
//! where
//! * Either source or destination can be CHUNKY or PLANAR.
//! * Either source or destination can reside on the CPU or the GPU.
//! * The RGB components are in any order (i.e. RGB or BGR; RGBA or BGRA).
//! * For RGBA (or BGRA) destinations, most implementations do not change the alpha channel, so it is recommended to
//!   set it at initialization time with [cuda]memset(im.pixels, -1, im.pitch * im.height) or
//!   [cuda]memset(im.pixels, -1, im.pitch * im.height * im.numComponents) for chunky and planar images respectively.
//! * YUV requires that the colorspace field be set manually prior to Transfer, e.g. typical for layout=NVCV_NV12:
//!   image.colorspace = NVCV_709 | NVCV_VIDEO_RANGE | NVCV_CHROMA_INTSTITIAL;
//! * There are also RGBf16-->RGBf32 and RGBf32-->RGBf16 transfers.
//! * Additionally, when the src and dst formats are the same, all formats are accommodated on CPU and GPU,
//!   and this can be used as a replacement for cudaMemcpy2DAsync() (which it utilizes). This is also true for YUV,
//!   whose src and dst must share the same format, layout and colorspace.
//!
//! When there is some kind of conversion AND the src and dst reside on different processors (CPU, GPU),
//! it is necessary to have a temporary GPU buffer, which is reshaped as needed to match the characteristics
//! of the CPU image. The same temporary image can be used in subsequent calls to NvCVImage_Transfer(),
//! regardless of the shape, format or component type, as it will grow as needed to accommodate
//! the largest memory requirement. The recommended usage for most cases is to supply an empty image
//! as the temporary; if it is not needed, no buffer is allocated. NULL can be supplied as the tmp
//! image, in which case an ephemeral buffer is allocated if needed, with resultant
//! performance degradation for image sequences.
//!
//! \param[in]      src     the source image.
//! \param[out]     dst     the destination image.
//! \param[in]      scale   a scale factor that can be applied when one (but not both) of the images
//!                         is based on floating-point components; this parameter is ignored when all image components
//!                         are represented with integer data types, or all image components are represented with
//!                         floating-point data types.
//! \param[in]      stream  the stream on which to perform the copy. This is ignored if both images reside on the CPU.
//! \param[in,out]  tmp     a temporary buffer that is sometimes needed when transferring images
//!                         between the CPU and GPU in either direction (can be empty or NULL).
//!                         It has the same characteristics as the CPU image, but it resides on the GPU.
//! \return         NVCV_SUCCESS           if successful,
//!                 NVCV_ERR_PIXELFORMAT   if one of the pixel formats is not accommodated.
//!                 NVCV_ERR_CUDA          if a CUDA error has occurred.
//!                 NVCV_ERR_GENERAL       if an otherwise unspecified error has occurred.
NvCV_Status NvCV_API NvCVImage_Transfer(
             const NvCVImage *src, NvCVImage *dst, float scale, struct CUstream_st *stream, NvCVImage *tmp);

Just wanted to follow up to make sure I understand.

tVarshney… when Andrew Page said this…

" The NGX SDK is 8bit. We have a newer SuperRes network as part of the Video Effects SDK https://developer.nvidia.com/video-effects-sdk-windows which uses float.

AP

What did he mean by floating point support then? I don’t see how that works with the rest of the above???

I honestly am still confused a bit?

Clint

tvarshney · July 16, 2021, 10:21pm

Hey @hansonclint Some of the API functions have floating point support(eg. the NvCVImage_Alloc function), you can try out some modification (as I suggested above) and try to make it work but we have trained our models primarily on u8 imagery so you may or may not get results matching your expectation.

system · September 14, 2021, 10:22pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error generated while running the code after connecting the camera Jetson Xavier NX gstreamer , nvbugs	45	1266	January 2, 2024
Parameters for conversion from RGB to YUV with NvMedia DRIVE AGX Xavier General nvmedia , driveos-nvmedia	11	2925	October 12, 2021
Uridecodebin with filesink DeepStream SDK	18	5873	October 12, 2021
How to switch between different video sources and zoom in to full screen on Sink DeepStream SDK deepstream	13	112	November 6, 2024
Convert OpenCV Mat to NvBufSurface DeepStream SDK	8	2240	March 1, 2022
Deepstream DeepStream SDK gstreamer , jetson , deepstream	14	468	July 9, 2024
How to use deepstream-app with MJPEG format stream? 2nd try DeepStream SDK usb , deepstream	8	1032	March 13, 2024
nvivafilter customer-lib implementation (code available) Jetson AGX Xavier	7	1787	October 18, 2021
RAW8 capture settings for nvcamerasrc Jetson TX1	23	6904	December 10, 2018
Access video frame and use opencv to draw on it in pad probe callback DeepStream SDK	19	3948	October 12, 2021

Issues with 16b or 32b images with the superres or upscale tools with MAXINE

Related topics