Hi Nvidia:
We are working on building our system, based on IGX Orin board kit, IGX OS 1.0.3.
Our first step is to update BMC firmware. But, like the topic said, it’s pretty in-reliable.
We were using curl
to updae the BMC firmware, like what we did before. In our experience, we should first use curl
to login and start the firmware updating process, then use curl
to check if the updating task is completed.
In our experience, 30~40 mins is long enough to update the firmware after we start the process. That means, after the percent complete of updating task reached 100, we still give it a little more time to make it stable. But, even so, it’s still not a reliable way to update it.
Here are my questions:
- How to flash the older BMC firmware? We tried the way we used to downgrade BMC firmware to 23 (IGX SW 1.0 DP version), but the task state doesn’t act right.
Using curl to update firmware:
curl -k -H "X-Auth-Token:$token" -H "Content-Type: application/octet-stream" -X POST -T `pwd`/cec1736-apfw-11062023.fwpkg
https://${bmc}/redfish/v1/UpdateService
{
"@odata.id": "/redfish/v1/TaskService/Tasks/1",
"@odata.type": "#Task.v1_4_3.Task",
"Id": "1",
"TaskState": "Running",
"TaskStatus": "OK"
}
Task state:
{
"@odata.id": "/redfish/v1/TaskService/Tasks/1",
"@odata.type": "#Task.v1_4_3.Task",
"EndTime": "2024-08-05T06:38:17+00:00",
"Id": "1",
"Messages": [
{
"@odata.type": "#Message.v1_0_0.Message",
"Message": "The task with id 1 has started.",
"MessageArgs": [
"1"
],
"MessageId": "TaskEvent.1.0.1.TaskStarted",
"Resolution": "None.",
"Severity": "OK"
},
{
"@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry",
"Message": "Transfer of image '0.0' to '' failed.",
"MessageArgs": [
"0.0",
""
],
"MessageId": "Update.1.0.TransferFailed",
"Resolution": "Debug Token Service is not ready, retry the firmware update operation after the management controller is ready. If the issue still persists reset the baseboard.",
"Severity": "Critical"
},
{
"@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry",
"Message": "The target device 'BMC_FW_AST2600_0' will be updated with image 'cec1736ApFw-09022024'.",
"MessageArgs": [
"BMC_FW_AST2600_0",
"cec1736ApFw-09022024"
],
"MessageId": "Update.1.0.TargetDetermined",
"Resolution": "None.",
"Severity": "OK"
},
{
"@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry",
"Message": "Image 'cec1736ApFw-09022024' is being transferred to 'BMC_FW_AST2600_0'.",
"MessageArgs": [
"cec1736ApFw-09022024",
"BMC_FW_AST2600_0"
],
"MessageId": "Update.1.0.TransferringToComponent",
"Resolution": "None.",
"Severity": "OK"
},
{
"@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry",
"Message": "Verification of image 'cec1736ApFw-09022024' at 'BMC_FW_AST2600_0' failed.",
"MessageArgs": [
"cec1736ApFw-09022024",
"BMC_FW_AST2600_0"
],
"MessageId": "Update.1.0.VerificationFailed",
"Resolution": "None.",
"Severity": "Critical"
},
{
"@odata.type": "#Message.v1_0_0.Message",
"Message": "The task with id 1 has changed to progress 100 percent complete.",
"MessageArgs": [
"1",
"100"
],
"MessageId": "TaskEvent.1.0.1.TaskProgressChanged",
"Resolution": "None.",
"Severity": "OK"
},
{
"@odata.type": "#Message.v1_0_0.Message",
"Message": "The task with id 1 has been aborted.",
"MessageArgs": [
"1"
],
"MessageId": "TaskEvent.1.0.1.TaskAborted",
"Resolution": "None.",
"Severity": "Critical"
},
{
"@odata.type": "#MessageRegistry.v1_4_1.MessageRegistry",
"Message": "The resource property 'BMC_FW_AST2600_0' has detected errors of type 'SKU mismatch'.",
"MessageArgs": [
"BMC_FW_AST2600_0",
"SKU mismatch"
],
"MessageId": "ResourceEvent.1.0.ResourceErrorsDetected",
"Resolution": "Verify the contents of the FW package",
"Severity": "Critical"
}
],
"Name": "Task 1",
"Payload": {
"HttpHeaders": [
"Host: 192.168.1.110",
"User-Agent: curl/7.81.0",
"Accept: */*",
"Content-Length: 67105977"
],
"HttpOperation": "POST",
"JsonBody": "null",
"TargetUri": "/redfish/v1/UpdateService"
},
"PercentComplete": 100,
"StartTime": "2024-08-05T06:38:17+00:00",
"TaskMonitor": "/redfish/v1/TaskService/Tasks/1/Monitor",
"TaskState": "Exception",
"TaskStatus": "Critical"
}
We want to downgrade BMC firmware, then update it again so that we can reproduce the issue.
- Firmware for Non-ERoT can be updated using initramfs. Why can’t the one for ERoT do so?
It seems like initramfs can be a more reliable way to update the firmware, but just for Non-ERoT. What will happen if we update the firmware for ERoT using initramfs?
If more info is needed, please let us know.
If this is actually the most reliable way, please also let us know.
Many Thanks.