18.5. Error Serialization

  • The error record serialization feature is used to save and retrieve hardware error information to and from a persistent store. OSPM interacts with the platform through a platform interface. If the Error Record Serialization Table (ERST) is present, OSPM uses the ACPI solution described below. Otherwise, OSPM uses the UEFI runtime variable services to carry out error record persistence operations on UEFI based platforms.

  • For error persistence across boots, the platform must implement some form of non-volatile store to save error records. The amount of space required depends on the platform’s processor architecture. Typically, this store will be flash memory or some other form of non-volatile RAM.

  • Serialized errors are encoded according to the Common Platform Error Record (CPER) format, which is described in the appendices of the UEFI Specification. These entries are referred to as error records.

  • The Error Record Serialization Interface is designed to be sufficiently abstract to allow hardware vendors flexibility in how they implement their error record serialization hardware. The platform provides details necessary to communicate with its serialization hardware by populating the ERST with a set of Serialization Instruction Entries. One or more serialization instruction entries comprise a Serialization Action. OSPM carries out serialization operations by executing a series of Serialization Actions. Serialization Actions and Serialization Instructions are described in detail in the following sections.

The following table details the ERST layout, which system firmware is responsible for building.

Table 18.16 Error Record Serialization Table (ERST)

Field

Byte Length

Byte Offset

Description

ACPI Standard Header

Header Signature

4

0

“ERST”. Signature for the Error Record Serialization Table.

Length

4

4

Length, in bytes, of entire ERST. Entire table must be contiguous.

Revision

1

8

1

Checksum

1

9

Entire table must sum to zero.

OEMID

6

10

OEM ID.

OEM Table ID

8

16

The manufacturer model ID.

OEM Revision

4

24

OEM revision of the ERST for the supplied OEM table ID.

Creator ID

4

28

Vendor ID of the utility that created the table.

Creator Revision

4

32

Revision of the utility that created the table.

Serialization Header

Serialization Header Size

4

36

Length in bytes of the serialization header.

Reserved

4

40

Must be zero.

Instruction Entry Count

4

44

The number of Serialization Instruction Entries in theSerialization Action Table.

Serialization Action Table

Serialization Instruction Entries

48

A series of error logging instruction entries.

18.5.1. Serialization Action Table

A Serialization Action is defined as a series of Serialization Instructions on registers that result in a well known action. A Serialization Instruction is a Serialization Action primitive and consists of either reading or writing an abstracted hardware register. The Serialization Action Table contains Serialization Instruction Entries for all the Serialization Actions the platform supports.

In most cases, a Serialization Action comprises only one Serialization Instruction, but it is conceivable that a more complex device will require more than one Serialization Instruction. When an action does comprise more than one instruction, the instructions must be listed consecutively and they will consequently be performed sequentially, according to their placement in the Serialization Action Table.

18.5.1.1. Serialization Actions

This section identifies the Serialization Actions that comprise the Error Record Serialization interface, as shown in the following table.

Table 18.17 Error Record Serialization Actions

Value

Name

Description

0x0

BEGIN_WRITE_OPERATION

Indicates to the platform that an error record write operation is beginning. This allows the platform to set its operational context.

0x1

BEGIN_READ_OPERATION

Indicates to the platform that an error record read operation is beginning. This allows the platform to set its operational context.

0x2

BEGIN_CLEAR_OPERATION

Indicates to the platform that an error record clear operation is beginning. This allows the platform to set its operation context.

0x3

END_OPERATION

Indicates to the platform that the current error record operation has ended. This allows the platform to clear its operational context.

0x4

SET_RECORD_OFFSET

Sets the offset from the base of the Error Log to transfer an error record.

0x5

EXECUTE_OPERATION

Instructs the platform to carry out the current operation based on the current operational context.

0x6

CHECK_BUSY_STATUS

Returns the state of the current operation. Once an operation has been executed through the EXECUTE_OPERATION action, the platform is required to return an indication that the operation is in progress until the operation completes. This allows the OS to poll for completion by repeatedly executing the CHECK_BUSY_STATUS action until the platform indicates that the operation not busy.

0x7

GET_COMMAND_STATUS

Returns the status of the current operation. The platform is expected to maintain a status code for each operation. Bits [8:1] of the value returned from the Register Region indicate the command status, which requires that the Bit Offset of the GAS for the Register Region is set to 1. See Command-Status-Definition for a list of valid command status codes.

0x8

GET_RECORD_IDENTIFIER

Returns the record identifier of an existing error record on the persistent store. The error record identifier is a 64-bit unsigned value as defined in the appendices of the UEFI Specification. If the record store is empty, this action must return 0xFFFFFFFFFFFFFFFF.

0x9

SET_RECORD_IDENTIFIER

Sets the record identifier. The error record identifier is a 64-bit unsigned value as defined in the appendices of the UEFI Specification.

0xA

GET_RECORD_COUNT

Retrieves the number of error records currently stored on the platforms persistent store. The platform is expected to maintain a count of the number of error records resident in its persistent store.

0xB

BEGIN_DUMMY_WRITE-_OPERATION

Indicates to the platform that a dummy error record write operation is beginning. This allows the platform to set its operational context. A dummy error record write operation performs no actual transfer of information from the Error Log Address Range to the persistent store.

0xC

RESERVED

Reserved.

0xD

GET_ERROR_LOG-_ADDRESS_RANGE

Returns the 64-bit physical address OSPM uses as the buffer for reading/writing error records.

0xE

GET_ERROR_LOG-_ADDRESS_RANGE_LENGTH

Returns the length in bytes of the Error Log Address Range

0xF

GET_ERROR_LOG-_ADDRESS_RANGE_ATTRIBUTES

Returns attributes that describe the behavior of the error log address range:
Bit [0] (0x1) - Reserved.
Bit [1] (0x2) - Non-Volatile: Indicates that the error log address range is in non-volatile RAM.
Bit [2] (0x4) - Slow: Indicates that the memory in which the error log address range is locates has slow access times.
All other bits reserved.

0x10

GET_EXECUTE-_OPERATION_TIMINGS

Returns an encoded QWORD:
[63:32] value in microseconds that the platform expects would be the maximum amount of time it will take to process and complete an EXECUTE_OPERATION.
[31:0] value in microseconds that the platform expects would be the nominal amount of time it will take to process and complete an EXECUTE_OPERATION.

The following table defines the serialization action status codes returned from GET_COMMAND_STATUS.

Table 18.18 Command Status Definition

Value

Description

0x00

Success

0x01

Not Enough Space

0x02

Hardware Not Available

0x03

Failed

0x04

Record Store Empty

0x05

Record Not Found

18.5.1.2. Serialization Instruction Entries

Each Serialization Action consists of a series of one or more Serialization Instructions. A Serialization Instruction represents a primitive operation on an abstracted hardware register represented by the register region as defined in a Serialization Instruction Entry.

A Serialization Instruction Entry describes a region in a serialization hardware register and the serialization instruction to be performed on that region. The following table details the layout of a Serialization Instruction Entry.

Table 18.19 Serialization Instruction Entry

Field

Byte Length

Byte Offset

Description

Serialization Action

1

N+0

The serialization action that this serialization instruction is a part of.

Instruction

1

N+1

Identifies the instruction to execute. See the Serialization Instructions table for a list of valid serialization instructions.

Flags

1

N+2

Flags that qualify the instruction.

Reserved

1

N+3

Must be zero.

Register Region

12

N+4

Generic Address Structure as defined in Section 5.2.3.2 to describe the address and bit.

Value

8

N+16

Value used with READ_REGISTER_VALUE and WRITE_REGISTER_VALUE instructions.

Mask

8

N+24

The bit mask required to obtain the bits corresponding to the serialization instruction in a given bit range defined by the register region.

Register Region is described as a generic address structure. This structure describes the physical address of a register as well as the bit range that corresponds to a desired region of the register. The bit range is defined as the smallest set of consecutive bits that contains every bit in the register that is associated with the Serialization Instruction. If bits [6:5] and bits [3:2] all correspond to a Serialization Instruction, the bit range for that instruction would be [6:2].

Because a bit range could contain bits that do not pertain to a particular Serialization Instruction (i.e. bit 4 in the example above), a bit mask is required to distinguish all the bits in the region that correspond to the instruction. The Mask field is defined to be this bit mask with a bit set to ‘1’ for each bit in the bit range (defined by the register region) corresponding to the Serialization Instruction. Note that bit 0 of the bit mask corresponds to the lowest bit in the bit range. In the example used above, the mask would be 11011b or 0x1B.

The Instruction field identifies the operation to be performed on the register region by the instruction entry. The following table identifies the instructions that are supported.

Table 18.20 Serialization Instructions

Value

Name

Description

0x00

READ_REGISTER

A READ_REGISTER instruction reads the designated information from the specified Register Region.

0x01

READ_REGISTER_VALUE

A READ_REGISTER_VALUE instruction reads the designated information from the specified Register Region and compares the results with the contents of the Value field. If the information read matches the contents of the Value field, TRUE is returned, else FALSE is returned.

0x02

WRITE_REGISTER

A WRITE_REGISTER instruction writes a value to the specified Register Region. The Value field is ignored.

0x03

WRITE_REGISTER_VALUE

A WRITE_REGISTER_VALUE instruction writes the contents of the Value field to the specified Register Region.

0x04

NOOP

This instruction is a NOOP.

0x05

LOAD_VAR1

Loads the VAR1 variable from the register region.

0x06

LOAD_VAR2

Loads the VAR2 variable from the register region.

0x07

STORE_VAR1

Stores the value in VAR1 to the indicate register region.

0x08

ADD

Adds VAR1 and VAR2 and stores the result in VAR1.

0x09

SUBTRACT

Subtracts VAR1 from VAR2 and stores the result in VAR1.

0x0A

ADD_VALUE

Adds the contents of the specified register region to Value and stores the result in the register region.

0x0B

SUBTRACT_VALUE

Subtracts Value from the contents of the specified register region and stores the result in the register region.

0x0C

STALL

Stall for the number of microseconds specified in Value.

0x0D

STALL_WHILE_TRUE

OSPM continually compares the contents of the specified register region to Value until the values are not equal. OSPM stalls between each successive comparison. The amount of time to stall is specified by VAR1 and is expressed in microseconds.

0x0E

SKIP_NEXT_INSTRUCTION_IF_TRUE

This is a control instruction which compares the contents of the register region with Value. If the values match, OSPM skips the next instruction in the sequence for the current action.

0x0F

GOTO

OSPM will go to the instruction specified by Value. The instruction is specified as the zero-based index. Each instruction for a given action has an index based on its relative position in the array of instructions for the action.

0x10

SET_SRC_ADDRESS_BASE

Sets the SRC_BASE variable used by the MOVE_DATA instruction to the contents of the register region.

0x11

SET_DST_ADDRESS_BASE

Sets the DST_BASE variable used by the MOVE_DATA instruction to the contents of the register region.

0x12

MOVE_DATA

Moves VAR2 bytes of data from SRC_BASE + Offset to DST_BASE + Offset, where Offset is the contents of the register region.

The Flags field allows qualifying flags to be associated with the instruction. The following table identifies the flags that can be associated with Serialization Instructions.

Table 18.21 Instruction Flags

Value

Name

Description

0x01

PRESERVE_REGISTER

For WRITE_REGISTER and WRITE_REGISTER_VALUE instructions, this flag indicates that bits within the register that are not being written must be preserved rather than destroyed. For READ_REGISTER instructions, this flag is ignored.

18.5.1.2.1. READ_REGISTER_VALUE

A read register value instruction reads the register region and compares the result with the specified value. If the values are not equal, the instruction failed. This can be described in pseudo code as follows:

X = Read(register)
X = X >> Bit Offset described in Register Region
X = X & Mask
If (X != Value) FAIL
SUCCEED

18.5.1.2.2. READ_REGISTER

A read register instruction reads the register region. The result is a generic value and should not be compared with Value. Value will be ignored. This can be described in pseudo code as follows:

X = Read(register)
X = X >> Bit Offset described in Register Region
X = X & Mask
Return X

18.5.1.2.3. WRITE_REGISTER_VALUE

A write register value instruction writes the specified value to the register region. If PRESERVE_REGISTER is set in Instruction Flags, then the bits not corresponding to the write value instruction are preserved. If the register is preserved, the write value instruction requires a read of the register. This can be described in pseudo code as follows:

X = Value & Mask
X = X << Bit Offset described in Register Region
If (Preserve Register)
Y = Read(register)
Y = Y & ~(Mask << Bit Offset)
X = X \| Y
Write(X, Register)

18.5.1.2.4. WRITE_REGISTER

A write register instruction writes a value to the register region. Value will be ignored. If PRESERVE_REGISTER is set in Instruction Flags, then the bits not corresponding to the write instruction are preserved. If the register is preserved, the write value instruction requires a read of the register. This can be described in pseudo code as follows:

X = supplied value
X = X & Mask
X = X << Bit Offset described in Register Region
If (Preserve Register)
Y = Read(register)
Y = Y & ~(Mask << Bit Offset)
X = X \| Y
Write(X, Register)

18.5.1.3. Error Record Serialization Information

The APEI error record includes an 8 byte field called OSPM Reserved. The following table defines the layout of this field. The error record serialization information is a small buffer the platform can use for serialization bookkeeping. The platform is free to use the 48 bits starting at bit offset 16 for its own purposes. It may use these bits to indicate the busy/free status of an error record, to record an internal identifier, etc.

Table 18.22 Error Record Serialization Info

Field

Bit Length

Bit Offset

Description

Signature

16

0

16-bit signature (‘ER’) identifying the start of the error record serialization data.

Platform Serialization Data

48

16

Platform private error record serialization information.

18.5.2. Operations

The error record serialization interface comprises three operations: Write, Read, and Clear. OSPM uses the Write operation to write a single error record to the persistent store. The Read operation is used to retrieve a single error record previously recorded to the persistent store using the write operation. The Clear operation allows OSPM to notify the platform that a given error record has been fully processed and is no longer needed, allowing the platform to recover the storage associated with a cleared error record.

Where the Error Log Address Range is NVRAM, significant optimizations are possible since transfer from the Error Log Address Range to a separate storage device is unnecessary. The platform may still, however, copy the record from NVRAM to another device, should it choose to. This allows, for example, the platform to copy error records to private log files. In order to give the platform the opportunity to do this, OSPM must use the Write operation to persist error records even when the Error Log Address Range is NVRAM. The Read and Clear operations, however, are unnecessary in this case as OSPM is capable of reading and clearing error records without assistance from the platform.

18.5.2.1. Writing

To write a single HW error record, OSPM executes the following steps:

  1. Initializes the error record’s serialization info. OSPM must fill in the Signature.

  2. Writes the error record to be persisted into the Error Log Address Range.

  3. Executes the BEGIN_WRITE_OPERATION action to notify the platform that a record write operation is beginning.

  4. Executes the SET_RECORD_OFFSET action to inform the platform where in the

  5. Error Log Address Range the error record resides.

  6. Executes the EXECUTE_OPERATION action to instruct the platform to begin the write operation.

  7. Busy waits by continually executing CHECK_BUSY_STATUS action until FALSE is returned.

  8. Executes a GET_COMMAND_STATUS action to determine the status of the write operation. If an error is indicated, the OS

  9. PM may retry the operation.

  10. Executes an END_OPERATION action to notify the platform that the record write operation is complete.

When OSPM performs the EXECUTE_OPERATION action in the context of a record write operation, the platform attempts to transfer the error record from the designated offset in the Error Log Address Range to a persistent store of its choice. If the Error Log Address Range is non-volatile RAM, no transfer is required.

Where the platform is required to transfer the error record from the Error Log Address Range to a persistent store, it performs the following steps in response to receiving a write command:

  1. Sets some internal state to indicate that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.

  2. Reads the error record’s Record ID field to determine where on the storage medium the supplied error record is to be written. The platform attempts to locate the specified error record on the persistent store.

    • If the specified error record does not exist, the platform attempts to write a new record to the persistent store.

    • If the specified error record does exists, then if the existing error record is large enough to be overwritten by the supplied error record, the platform can do an in-place replacement. If the existing record is not large enough to be overwritten, the platform must attempt to locate space in which to write the new record. It may mark the existing record as Free and coalesce adjacent free records in order to create the necessary space.

  3. Transfers the error record to the selected location on the persistent store.

  4. Updates an internal Record Count if a new record was written.

  5. Records the status of the operation so OSPM can retrieve the status by executing a GET_COMMAND_STATUS action.

  6. Modifies internal busy state as necessary so when OS PM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.

If the Error Log Address Range resides in NVRAM, the minimum steps required of the platform are:

  1. Sets some internal state to indication that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.

  2. Records the status of the operation so OSPM can retrieve the status by executing a GET_COMMAND_STATUS action.

  3. Clear internal busy state so when OS PM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.

18.5.2.2. Reading

During boot, OSPM attempts to retrieve all serialized error records from the persistent store. If the Error Log Address Range does not reside in NVRAM, the following steps are executed by OSPM to retrieve all error records:

  1. Executes the BEGIN_ READ_OPERATION action to notify the platform that a record read operation is beginning.

  2. Executes the SET_ RECORD_OFFSET action to inform the platform at what offset in the Error Log Address Range the error record is to be transferred.

  3. Executes the SET_RECORD_IDENTIFER action to inform the platform which error record is to be read from its persistent store.

  4. Executes the EXECUTE_OPERATION action to instruct the platform to begin the read operation.

  5. Busy waits by continually executing CHECK_BUSY_STATUS action until FALSE is returned.

  6. Executes a GET_COMMAND_STATUS action to determine the status of the read operation.

    • If the status is Record Store Empty (0x04), continue to step 7.

    • If an error occurred reading a valid error record, the status will be Failed (0x03), continue to step 7.

    • If the status is Record Not Found (0x05), indicating that the specified error record does not exist, OSPM retrieves a valid identifier by executing a GET_RECORD_IDENTIFIER action. The platform will return a valid record identifier.

    • If the status is Success, OSPM transfers the retrieved record from the Error Log Address Range to a private buffer and then executes the GET_RECORD_IDENTIFIER action to determine the identifier of the next record in the persistent store.

  7. Execute an END_OPERATION to notify the platform that the record read operation is complete.

The steps performed by the platform to carry out a read request are as follows:

  1. Sets some internal state to indicate that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.

  2. Using the record identifier supplied by OSPM through the SET_RECORD_IDENTIFIER operation, determine which error record to read:

    • If the identifier is 0x0 (unspecified), the platform reads the ‘first’ error record from its persistent store (first being implementation specific).

    • If the identifier is non-zero, the platform attempts to locate the specified error record on the persistent store.

    • If the specified error record does not exist, set the status register’s Status to Record Not Found (0x05), and update the status register’s Identifier field with the identifier of the ‘first’ error record.

  3. Transfer the record from the persistent store to the offset specified by OSPM from the base of the Error Log Address Range.

  4. Record the Identifier of the ‘next’ valid error record that resides on the persistent store. This allows OSPM to retrieve a valid record identifier by executing a GET_RECORD_IDENTIFIER operation.

  5. Record the status of the operation so OSPM can retrieve the status by executing a GET_COMMAND_STATUS action.

  6. Clear internal busy state so when OSPM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.

Where the Error Log Address Range does reside in NVRAM, OSPM requires no platform support to read persisted error records. OSPM can scan the Error Log Address Range on its own and retrieve the error records it previously persisted.

18.5.2.3. Clearing

After OSPM has finished processing an error record, it will notify the platform by clearing the record. This allows the platform to delete the record from the persistent store or mark it such that the space is free and can be reused. The following steps are executed by OSPM to clear an error record:

  1. Executes a BEGIN_ CLEAR_OPERATION action to notify the platform that a record clear operation is beginning.

  2. Executes a SET_RECORD_IDENTIFER action to inform the platform which error record is to be cleared. This value must not be set to 0x0 (unspecified).

  3. Executes an EXECUTE_OPERATION action to instruct the platform to begin the clear operation.

  4. Busy waits by continually executing CHECK_BUSY_STATUS action until FALSE is returned.

  5. Executes a GET_COMMAND_STATUS action to determine the status of the clear operation.

  6. Execute an END_OPERATION to notify the platform that the record read operation is complete.

The platform carries out a clear request by performing the following steps:

  1. Sets some internal state to indication that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.

  2. Using the record identifier supplied by OSPM through the SET_RECORD_IDENTIFIER operation, determine which error record to clear. This value may not be 0x0 (unspecified).

  3. Locate the specified error record on the persistent store.

  4. Mark the record as free by updating the Attributes in its serialization header.

  5. Update internal record count.

  6. Clear internal busy state so when OS PM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.

When the Error Log Address Range resides in NVRAM, the OS requires no platform support to Clear error records.

18.5.2.4. Usage

This section describes several possible ways the error record serialization mechanism might be implemented.

18.5.2.4.1. Error Log Address Range Resides in NVRAM

If the Error Log Address Range resides in NVRAM, then when OSPM writes a record into the logging range, the record is automatically persistent and the busy bit can be cleared immediately. On a subsequent boot, OSPM can read any persisted error records directly from the persistent store range. The size of the persistent store, in this case, is expected to be enough for several error records.

18.5.2.4.2. Error Log Address Range Resides in (volatile) RAM

In this implementation, the Error Log Address Range describes an intermediate location for error records. To persist a record, OSPM copies the record into the Error Log Address Range and sets the Execute, at which time the platform runs necessary code (SMM code on non-UEFI based systems and UEFI runtime code on UEFI-enabled systems) to transfer the error record from main memory to some persistent store. To read a record, OSPM asks the platform to copy a record from the persistent store to a specified offset within the Error Log Address Range. The size of the Error Log Address Range is at least large enough for one error record.

18.5.2.4.3. Error Log Address Range Resides on Service Processor

In this type of implementation, the Error Log Address Range is really MMIO. When OSPM writes an error record to the Error Log Address Range, it is really writing to memory on a service processor. When the OSPM sets the Execute control bit, the platform knows that the OSPM is done writing the record and can do something with it, like move it into a permanent location (i.e. hard disk) on the service processor. The size of the persistent store in this type of implementation is typically large enough for one error record.

18.5.2.4.4. Error Log Address Range is Copied Across Network

In this type of implementation, the Error Log Address Range is an intermediate cache for error records. To persist an error record, OSPM copies the record into the Error Log Address Range and set the Execute control bit, and the platform runs code to transmit this error record over the wire. The size of the Error Log Address Range in this type of implementation is typically large enough for one error record.