Frame triggering on the Orin Devkit, driver initialization

Please provide the following info (tick the boxes after creating this topic):
Software Version
DRIVE OS 6.0.8.1
DRIVE OS 6.0.6
DRIVE OS 6.0.5
DRIVE OS 6.0.4 (rev. 1)
DRIVE OS 6.0.4 SDK
other

Target Operating System
Linux
QNX
other

Hardware Platform
DRIVE AGX Orin Developer Kit (940-63710-0010-300)
DRIVE AGX Orin Developer Kit (940-63710-0010-200)
DRIVE AGX Orin Developer Kit (940-63710-0010-100)
DRIVE AGX Orin Developer Kit (940-63710-0010-D00)
DRIVE AGX Orin Developer Kit (940-63710-0010-C00)
DRIVE AGX Orin Developer Kit (not sure its number)
other

SDK Manager Version
1.9.3.10904
other

Host Machine Version
native Ubuntu Linux 20.04 Host installed with SDK Manager
native Ubuntu Linux 20.04 Host installed with DRIVE OS Docker Containers
native Ubuntu Linux 18.04 Host installed with DRIVE OS Docker Containers
other

Hello,

We’re experimenting with the advanced camera triggering options on the Orin devkit. To start out, I’ve tried running the sample camera app with the command line arguments added to select Hetero Frame Sync. (-D) I noticed that setting these arguments had no effect on the program. A little debugging showed that deserializerInterfaceProvider->GetInterface(MAX96712_FUSA_NV_CUSTOM_INTERFACE_ID) is returning null. Probing a little further, I saw that deserializer names in the platform configuration are set to “MAX96712”. (We’re using platform configuration F008A120RM0AV2_CPHY_x4) Out of curiosity, I tried manually overriding it and setting it to “MAX96712_Fusa_nv”. That is, I kept the rest of the platform configuration intact but I added a tiny little piece of code that changes all of the deserializer names. However, this causes a memory crash, specifically “raised STORAGE_ERROR : stack overflow or erroneous memory access” inside the m_pCamera->Init() call.

So, I have a few questions. The Advanced GPIO Frame Sync Triggering is supposed to work more or less out of the box, no? Or what additional configuration do I need to do? (Yes I confirmed that we have Board Variant p3710-10-s05 and that the installed OS version is 6.0.8.1-34171226) If the deserializer names are set to “MAX96712” in the platform configuration, that actually guarantees that it won’t be able to do any of the advanced functions, no? Because it will load the old driver? So why are any platform configs set to that at all? In general is it ok for me to just change the name of the deserializer driver like that? Or is that breaking all kinds of assumptions and is bound to cause it to crash?

And finally, I think it’s actually a little interesting that the m_pCamera->Init() call failed with an internal memory corruption. Even if there is an error in my arguments to the method, these things should generally return with a status, not just blow up the call stack, no?

Thanks so much for your help, I’d just like to say that I’m super-impressed with the quality of the product in general.

Jason Catlin

Could you please let me know in which file you changed the deserializer names?

Sure! I added this, in main.cpp:

LOG_INFO(“Enumerating %d device blocks\n”, oPlatformCfg.numDeviceBlocks);
for(int i=0; i<oPlatformCfg.numDeviceBlocks; i++){
DeviceBlockInfo& block = oPlatformCfg.deviceBlockList[i];
LOG_INFO(" - %s %d\n", block.deserInfo.name.c_str(), int(block.deserInfo.useCDIv2API));

    block.deserInfo.name = NEW_DRIVER_NAME;
    LOG_INFO(" -- %s %d\n", block.deserInfo.name.c_str(), int(block.deserInfo.useCDIv2API));
}

This is after the files are parsed, and immediately before the masking is applied.

Entron F008A120RM0AES is compatible with MAX96712 deserializer driver(/drive/drive-linux/samples/nvmedia/nvsipl/devblk/devices/MAX96712DeserializerDriver). Please don’t use MAX96712_Fusa_nv driver with Entron F008A120RM0AE library you package. This will not work properly.

Additionally, the supported camera modules with MAX96712_Fusa_nv are Smartlead IMX728, IMX623, Leopard OV2311 B2, Smartlead OX5B modules.

Hey man, thanks so much for your help.

But I have to admit I’m a little confused. The deserializer driver and the camera driver are really mostly separate things, right? They will really only have limited interaction beyond the deserializer simply ordering the camera to take a picture. So why is that that the camera driver has to be compatible with this? Like, if the registers on the MAX96712 are configured to take Fysnc signals, how does this really even concern the cameras at all?

Hey,

I’ve been doing some more work on this. In order to solve the problem, I decided to try making a modified version of the old driver (MAX96712DeserializerDriver) with simply the “SetHeteroFrameSync” function added to it, graftted from the Fusa driver back onto the older driver. To this end, I created a copy of the MAX96712 driver, exported through a “so” file. This works fine! I can run the camera app and everything using our own custom driver, which at this point is nothing but just the provided NVidia driver.

Now, the plan is to eventually engage in more extensive upgrades to add Frame Syncing functionality. But, I noticed that before I even did that at all, adding members to the Context struct causes it to crash. That is, in the file “cdi_max96712.h” there is a struct ContextMAX96712. You can add members to that struct, even members that don’t do anything, like so:

typedef struct {
/* These must be set in supplied client ctx during driver creation */
GMSLModeMAX96712 gmslMode[MAX96712_MAX_NUM_LINK];

... other members here ...

CfgPipelineCopyMAX96712 cfgPipeCopy;

int padding[4]; // NEW!

} ContextMAX96712;

Adding that “padding” variable causes a memory crash, from deep down within the NVidia stack. Now mind you, I haven’t added any Frame Syncinc functionality yet. The plan is to do that in the future of course, but this version doesn’t. This version has nothing other than additional members added to the Context struct, and it causes a memory crash.

So at this point, I really believe that it’s very likely that there’s some sort of memory corruption bug in the NVidia stack somewhere, a dangling pointer or something. Of course anything’s possible, but I really can’t explain this any other way. This is the bog-standard NVidia driver, except it’s imported as an so. When you add that padding, and nothing else, it causes a memory crash. By the way, even, it actually works when you change it to “padding[1]”! There’s no crash in that case! It only crashes when you add four words of padding.

The deserializer isn’t an independent component; it must be matched with the serializer setting in the camera module to establish the forward/reverse channels correctly. Therefore, their settings are logically coupled in the SIPL Devblk CDD library.

The MAX96712 deserializer driver provides minimal support to enable specific cameras, such as the Entron F008A120RM0AES, which only requires a 30FPS sync signal. Conversely, the MAX96712_Fusa_nv driver is compatible with different camera sets, like those used for OMS and DMS applications, which need various sync signals (e.g., 60FPS, 30FPS, 15FPS). This driver can select the sync signal source based on the camera requirements.

Due to these differences, each driver has distinct APIs used to pair SerDes in the SIPL Devblk CDD library. Using the wrong driver causes issues due to this mismatch. Hence, you should use the correct deserializer driver that matches your camera module’s requirements to avoid compatibility problems.

Yeah man, I absolutely agree that there could be complicated interactions between the the deserializer and the serializer. That’s why I tried, as an experiment, to build a version that just adds members to a struct and nothing else. The code that I tried running uses the MAX96712 deserializer, exactly, with no other changes other than just adding members to a Context struct that don’t do anything. That’s the point of the experiment. It doesn’t add frame syncing functionality, it doesn’t add any custom interfaces, it doesn’t change any of the core I2C driver logic. It has one and only one change, which is to add padding variables to the Context struct, which are never even read from or written to. This causes a memory crash.

Thank you for sharing your detailed findings from your experiments. To assist you further, could you please provide the patch and detailed steps you followed so we can attempt to reproduce the issue on our end?

Additionally, please note that support for camera driver development is typically provided to specific customers only and not through this forum. This matter may fall outside the scope of support we can offer here. You may need to reach out to your NVIDIA representative to discuss your camera-related driver development and obtain the necessary support.

deser.zip (66.7 KB)
deser_bin.zip (53.2 KB)
main_cpp.zip (18.3 KB)

Three attachments:

  • The source code for the standalone deserilizer driver, which is the MAX96712 copied out of the source tree with exactly one change, adding members to a Context struct in cdi_max96712.h
  • The compiled binary .so file of the deserializer driver
  • The main.cpp from the nvidia camera sample app, modified simply to use the custom deserializer

Now the BUILD file is written for nuro’s build system, so you’d have to rewrite that if you wanted to recompile it. I included the binary compiled deserializer driver too, if you just want to use that, so that should make things easier for you.

Other than that, I run the app with:

udo nvsipl_camera.runfiles/experimental/jcatlin/nvsipl_camera_2/nvsipl_camera_native -c “F008A120RM0AV2_CPHY_x4” --link-enable-masks “0x1111 0x0000 0x1100 0x0000” -sR -r 1

And when I do that I get a memory crash. Thanks so much for your help!

Could you please share the complete output from executing the command you provided? This will help in better understanding the context of the memory crash.

Have you attempted a similar experiment with the original existing driver name to observe any differences in behavior?

While I’m keen to assist you, I must mention that the matter might fall beyond the scope of support we can offer here in the forum. Eventually, you may need to engage with your NVIDIA representative to discuss your camera-related driver development and obtain the appropriate support channels.

Yeah you bet. This is the output from the crashing version:

nvidia@orin-2a:~/jcatlin/jason$ sudo nvsipl_camera.runfiles/experimental/jcatlin/nvsipl_camera_2/nvsipl_camera_native -c “F008A120RM0AV2_CPHY_x4” --link-enable-masks “0x1111 0x0000 0x1100 0x0000” -sR -r 1
Pipeline: 0 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 0 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 0 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 1 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 1 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 1 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 2 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 2 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 2 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 3 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 3 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 3 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 10 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 10 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 10 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 11 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 11 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 11 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
DriverCreate enter
DriverCreate exit
DriverCreate enter
DriverCreate exit
MAX96712: Revision 5 detected
MAX96712: Revision 5 detected
MAX96712 Link 2: PHY optimization was enabled
MAX96712 Link 3: PHY optimization was enabled
MAX96712 Link 0: PHY optimization was enabled
MAX96712 Link 1: PHY optimization was enabled
MAX96712 Link 2: PHY optimization was enabled
MAX96712 Link 3: PHY optimization was enabled

raised STORAGE_ERROR : stack overflow or erroneous memory access

raised STORAGE_ERROR : stack overflow or erroneous memory access

And then this is the output from the non-crashing version, with the original driver:

nvidia@orin-2a:~/jcatlin/jason$ sudo nvsipl_camera.runfiles/experimental/jcatlin/nvsipl_camera_2/nvsipl_camera_native -c “F008A120RM0AV2_CPHY_x4” --link-enable-masks “0x1111 0x0000 0x1100 0x0000” -sR -r 1
Pipeline: 0 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 0 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 0 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 1 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 1 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 1 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 2 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 2 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 2 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 3 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 3 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 3 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 10 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 10 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 10 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
Pipeline: 11 ISP Output: 0 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 11 ISP Output: 1 is using YUV 420 SEMI-PLANAR UINT8 BL REC_709ER
Pipeline: 11 ISP Output: 2 is using RGBA PACKED FLOAT16 PL SENSOR_RGBA
MAX96712: Revision 5 detected
MAX96712: Revision 5 detected
MAX96712 Link 2: PHY optimization was enabled
MAX96712 Link 3: PHY optimization was enabled
MAX96712 Link 0: PHY optimization was enabled
MAX96712 Link 1: PHY optimization was enabled
MAX96712 Link 2: PHY optimization was enabled
MAX96712 Link 3: PHY optimization was enabled
MAX9295: Revision 8 detected!
MAX9295: Revision 8 detected!
Sensor AR0820 GRBG Rev 2.1 detected!
Sensor AR0820 GRBG Rev 2.1 detected!
MAX9295: Revision 8 detected!
MAX9295: Revision 8 detected!
Sensor AR0820 GRBG Rev 2.1 detected!
Sensor AR0820 GRBG Rev 2.1 detected!
MAX9295: Revision 8 detected!
Sensor AR0820 GRBG Rev 2.1 detected!
MAX9295: Revision 8 detected!
Sensor AR0820 GRBG Rev 2.1 detected!
nvsipl_camera: Opened NITO file for module “F008A120RM0AV2”, file name: “/usr/share/camera/F008A120RM0AV2.nito”
nvsipl_camera: Opened NITO file for module “F008A120RM0AV2”, file name: “/usr/share/camera/F008A120RM0AV2.nito”
nvsipl_camera: Opened NITO file for module “F008A120RM0AV2”, file name: “/usr/share/camera/F008A120RM0AV2.nito”
nvsipl_camera: Opened NITO file for module “F008A120RM0AV2”, file name: “/usr/share/camera/F008A120RM0AV2.nito”
nvsipl_camera: Opened NITO file for module “F008A120RM0AV2”, file name: “/usr/share/camera/F008A120RM0AV2.nito”
nvsipl_camera: Opened NITO file for module “F008A120RM0AV2”, file name: “/usr/share/camera/F008A120RM0AV2.nito”
Output
… continues …

Could you please provide the output from the command with the “-v 4” option added? It appears that your modification may have resulted in an unexpected null pointer access issue.

Oh hey,

I’m actually 90% sure what’s causing this. I was under the impression that each driver module is it’s own self-contained little codebase. The problem is that cdi_max96712.h is referenced by many outside driver files, in particular “cameramodule/MAX96712cameramodule/CNvMTransportLink_Max96712_9295.cpp” for example. So, the way I’m testing this, recompiling the deserializer driver without recompiling the whole DriveOS, will result in two different definitions of that struct in the same binary; the newly compiled version will have my newly compiled version of the struct, but the rest of DriveOS will still be expecting the old version. All of the callstacks that I’m looking at are consistent with that being the problem.

Sorry to bother you.