18.4. Firmware First Error Handling

It may be necessary for the platform to process certain classes of errors in firmware before relinquishing control to OSPM for further error handling. Errata management and error containment are two examples where firmware-first error handling is beneficial. Generic hardware error sources support this model through the related source ID.

The platform reports the original error source to OSPM via the hardware error source table (HEST) and sets the FIRMWAREFIRST flag for this error source. In addition, the platform must report a generic error source with a related source ID set to the original source ID. This generic error source is used to notify OSPM of the errors on the original source and their status after the firmware first handling.

There are different notification strategies that can be used in firmware first handling; the following options are available to the platform:

  • Traditional ACPI platforms may use NMI to notify the OSPM of both corrected and uncorrected errors for a given error source

  • Traditional ACPI platforms may use NMI to report uncorrected errors and the SCI to report corrected errors

  • Traditional ACPI platforms may use NMI to report uncorrected errors and polling to notify the OSPM of corrected errors

  • HW-reduced ACPI platforms may use GPIO-signaled events, Interrupt-signaled events, or polling to report corrected errors.

18.4.1. Example: Firmware First Handling Using NMI Notification

If the platform chooses to use NMI to report errors, which is the recommended method for uncorrected errors, the platform follows these steps:

  1. System firmware configures the platform to trigger a firmware handler when the error occurs

  2. System firmware identifies the error source for which it will handle errors via the error source enumeration interface by setting the FIRMWARE_FIRST flag

  3. System firmware describes the generic error source, and the associated error status block, as described in Generic Hardware Error Source. System firmware identifies the relation between the generic error sourcde and the original error source by using the original source ID in the related source ID of Generic Hardware Error Source Structure.

  4. When a hardware error reported by the error source occurs, system firmware gains control and handles the error condition as required. Upon completion system firmware should do the following:

  5. Extract the error information from the error source and fill in the error information in the data block of the generic error source it identified as an alternate in step 3. The error information format follows the specification in Generic Error Data

  6. Set the appropriate bit in the block status field (Generic Error Status Block ) to indicate to the OSPM that a valid error condition is present.

  7. Clears error state from the hardware.

  8. Generates an NMI.

At this point, the OSPM NMI handler scans the list of generic error sources to find the error source that reported the error and processes the error report