18.5. Error Serialization¶
The error record serialization feature is used to save and retrieve hardware error information to and from a persistent store. OSPM interacts with the platform through a platform interface. If the Error Record Serialization Table (ERST) is present, OSPM uses the ACPI solution described below. Otherwise, OSPM uses the UEFI runtime variable services to carry out error record persistence operations on UEFI based platforms.
For error persistence across boots, the platform must implement some form of non-volatile store to save error records. The amount of space required depends on the platform’s processor architecture. Typically, this store will be flash memory or some other form of non-volatile RAM.
Serialized errors are encoded according to the Common Platform Error Record (CPER) format, which is described in the appendices of the UEFI Specification. These entries are referred to as error records.
The Error Record Serialization Interface is designed to be sufficiently abstract to allow hardware vendors flexibility in how they implement their error record serialization hardware. The platform provides details necessary to communicate with its serialization hardware by populating the ERST with a set of Serialization Instruction Entries. One or more serialization instruction entries comprise a Serialization Action. OSPM carries out serialization operations by executing a series of Serialization Actions. Serialization Actions and Serialization Instructions are described in detail in the following sections.
The following table details the ERST layout, which system firmware is responsible for building.
Field |
Byte Length |
Byte Offset |
Description |
---|---|---|---|
ACPI Standard Header |
|||
Header Signature |
4 |
0 |
“ERST”. Signature for the Error Record Serialization Table. |
Length |
4 |
4 |
Length, in bytes, of entire ERST. Entire table must be contiguous. |
Revision |
1 |
8 |
1 |
Checksum |
1 |
9 |
Entire table must sum to zero. |
OEMID |
6 |
10 |
OEM ID. |
OEM Table ID |
8 |
16 |
The manufacturer model ID. |
OEM Revision |
4 |
24 |
OEM revision of the ERST for the supplied OEM table ID. |
Creator ID |
4 |
28 |
Vendor ID of the utility that created the table. |
Creator Revision |
4 |
32 |
Revision of the utility that created the table. |
Serialization Header |
|||
Serialization Header Size |
4 |
36 |
Length in bytes of the serialization header. |
Reserved |
4 |
40 |
Must be zero. |
Instruction Entry Count |
4 |
44 |
The number of Serialization Instruction Entries in theSerialization Action Table. |
Serialization Action Table |
|||
Serialization Instruction Entries |
48 |
A series of error logging instruction entries. |
18.5.1. Serialization Action Table¶
A Serialization Action is defined as a series of Serialization Instructions on registers that result in a well known action. A Serialization Instruction is a Serialization Action primitive and consists of either reading or writing an abstracted hardware register. The Serialization Action Table contains Serialization Instruction Entries for all the Serialization Actions the platform supports.
In most cases, a Serialization Action comprises only one Serialization Instruction, but it is conceivable that a more complex device will require more than one Serialization Instruction. When an action does comprise more than one instruction, the instructions must be listed consecutively and they will consequently be performed sequentially, according to their placement in the Serialization Action Table.
18.5.1.1. Serialization Actions¶
This section identifies the Serialization Actions that comprise the Error Record Serialization interface, as shown in the following table.
Value |
Name |
Description |
---|---|---|
0x0 |
BEGIN_WRITE_OPERATION |
Indicates to the platform that an error record write operation is beginning. This allows the platform to set its operational context. |
0x1 |
BEGIN_READ_OPERATION |
Indicates to the platform that an error record read operation is beginning. This allows the platform to set its operational context. |
0x2 |
BEGIN_CLEAR_OPERATION |
Indicates to the platform that an error record clear operation is beginning. This allows the platform to set its operation context. |
0x3 |
END_OPERATION |
Indicates to the platform that the current error record operation has ended. This allows the platform to clear its operational context. |
0x4 |
SET_RECORD_OFFSET |
Sets the offset from the base of the Error Log to transfer an error record. |
0x5 |
EXECUTE_OPERATION |
Instructs the platform to carry out the current operation based on the current operational context. |
0x6 |
CHECK_BUSY_STATUS |
Returns the state of the current operation. Once an operation has been executed through the EXECUTE_OPERATION action, the platform is required to return an indication that the operation is in progress until the operation completes. This allows the OS to poll for completion by repeatedly executing the CHECK_BUSY_STATUS action until the platform indicates that the operation not busy. |
0x7 |
GET_COMMAND_STATUS |
Returns the status of the current operation. The platform is expected to maintain a status code for each operation. Bits [8:1] of the value returned from the Register Region indicate the command status, which requires that the Bit Offset of the GAS for the Register Region is set to 1. See Command-Status-Definition for a list of valid command status codes. |
0x8 |
GET_RECORD_IDENTIFIER |
Returns the record identifier of an existing error record on the persistent store. The error record identifier is a 64-bit unsigned value as defined in the appendices of the UEFI Specification. If the record store is empty, this action must return 0xFFFFFFFFFFFFFFFF. |
0x9 |
SET_RECORD_IDENTIFIER |
Sets the record identifier. The error record identifier is a 64-bit unsigned value as defined in the appendices of the UEFI Specification. |
0xA |
GET_RECORD_COUNT |
Retrieves the number of error records currently stored on the platforms persistent store. The platform is expected to maintain a count of the number of error records resident in its persistent store. |
0xB |
BEGIN_DUMMY_WRITE-_OPERATION |
Indicates to the platform that a dummy error record write operation is beginning. This allows the platform to set its operational context. A dummy error record write operation performs no actual transfer of information from the Error Log Address Range to the persistent store. |
0xC |
RESERVED |
Reserved. |
0xD |
GET_ERROR_LOG-_ADDRESS_RANGE |
Returns the 64-bit physical address OSPM uses as the buffer for reading/writing error records. |
0xE |
GET_ERROR_LOG-_ADDRESS_RANGE_LENGTH |
Returns the length in bytes of the Error Log Address Range |
0xF |
GET_ERROR_LOG-_ADDRESS_RANGE_ATTRIBUTES |
Returns attributes that describe the behavior of the error log address range:
Bit [0] (0x1) - Reserved.
Bit [1] (0x2) - Non-Volatile: Indicates that the error log address range is in non-volatile RAM.
Bit [2] (0x4) - Slow: Indicates that the memory in which the error log address range is locates has slow access times.
All other bits reserved.
|
0x10 |
GET_EXECUTE-_OPERATION_TIMINGS |
Returns an encoded QWORD:
[63:32] value in microseconds that the platform expects would be the maximum amount of time it will take to process and complete an EXECUTE_OPERATION.
[31:0] value in microseconds that the platform expects would be the nominal amount of time it will take to process and complete an EXECUTE_OPERATION.
|
The following table defines the serialization action status codes returned from GET_COMMAND_STATUS.
Value |
Description |
---|---|
0x00 |
Success |
0x01 |
Not Enough Space |
0x02 |
Hardware Not Available |
0x03 |
Failed |
0x04 |
Record Store Empty |
0x05 |
Record Not Found |
18.5.1.2. Serialization Instruction Entries¶
Each Serialization Action consists of a series of one or more Serialization Instructions. A Serialization Instruction represents a primitive operation on an abstracted hardware register represented by the register region as defined in a Serialization Instruction Entry.
A Serialization Instruction Entry describes a region in a serialization hardware register and the serialization instruction to be performed on that region. The following table details the layout of a Serialization Instruction Entry.
Field |
Byte Length |
Byte Offset |
Description |
---|---|---|---|
Serialization Action |
1 |
N+0 |
The serialization action that this serialization instruction is a part of. |
Instruction |
1 |
N+1 |
Identifies the instruction to execute. See the Serialization Instructions table for a list of valid serialization instructions. |
Flags |
1 |
N+2 |
Flags that qualify the instruction. |
Reserved |
1 |
N+3 |
Must be zero. |
Register Region |
12 |
N+4 |
Generic Address Structure as defined in Section 5.2.3.2 to describe the address and bit. |
Value |
8 |
N+16 |
Value used with READ_REGISTER_VALUE and WRITE_REGISTER_VALUE instructions. |
Mask |
8 |
N+24 |
The bit mask required to obtain the bits corresponding to the serialization instruction in a given bit range defined by the register region. |
Register Region is described as a generic address structure. This structure describes the physical address of a register as well as the bit range that corresponds to a desired region of the register. The bit range is defined as the smallest set of consecutive bits that contains every bit in the register that is associated with the Serialization Instruction. If bits [6:5] and bits [3:2] all correspond to a Serialization Instruction, the bit range for that instruction would be [6:2].
Because a bit range could contain bits that do not pertain to a particular Serialization Instruction (i.e. bit 4 in the example above), a bit mask is required to distinguish all the bits in the region that correspond to the instruction. The Mask field is defined to be this bit mask with a bit set to ‘1’ for each bit in the bit range (defined by the register region) corresponding to the Serialization Instruction. Note that bit 0 of the bit mask corresponds to the lowest bit in the bit range. In the example used above, the mask would be 11011b or 0x1B.
The Instruction field identifies the operation to be performed on the register region by the instruction entry. The following table identifies the instructions that are supported.
Value |
Name |
Description |
---|---|---|
0x00 |
READ_REGISTER |
A READ_REGISTER instruction reads the designated information from the specified Register Region. |
0x01 |
READ_REGISTER_VALUE |
A READ_REGISTER_VALUE instruction reads the designated information from the specified Register Region and compares the results with the contents of the Value field. If the information read matches the contents of the Value field, TRUE is returned, else FALSE is returned. |
0x02 |
WRITE_REGISTER |
A WRITE_REGISTER instruction writes a value to the specified Register Region. The Value field is ignored. |
0x03 |
WRITE_REGISTER_VALUE |
A WRITE_REGISTER_VALUE instruction writes the contents of the Value field to the specified Register Region. |
0x04 |
NOOP |
This instruction is a NOOP. |
0x05 |
LOAD_VAR1 |
Loads the VAR1 variable from the register region. |
0x06 |
LOAD_VAR2 |
Loads the VAR2 variable from the register region. |
0x07 |
STORE_VAR1 |
Stores the value in VAR1 to the indicate register region. |
0x08 |
ADD |
Adds VAR1 and VAR2 and stores the result in VAR1. |
0x09 |
SUBTRACT |
Subtracts VAR1 from VAR2 and stores the result in VAR1. |
0x0A |
ADD_VALUE |
Adds the contents of the specified register region to Value and stores the result in the register region. |
0x0B |
SUBTRACT_VALUE |
Subtracts Value from the contents of the specified register region and stores the result in the register region. |
0x0C |
STALL |
Stall for the number of microseconds specified in Value. |
0x0D |
STALL_WHILE_TRUE |
OSPM continually compares the contents of the specified register region to Value until the values are not equal. OSPM stalls between each successive comparison. The amount of time to stall is specified by VAR1 and is expressed in microseconds. |
0x0E |
SKIP_NEXT_INSTRUCTION_IF_TRUE |
This is a control instruction which compares the contents of the register region with Value. If the values match, OSPM skips the next instruction in the sequence for the current action. |
0x0F |
GOTO |
OSPM will go to the instruction specified by Value. The instruction is specified as the zero-based index. Each instruction for a given action has an index based on its relative position in the array of instructions for the action. |
0x10 |
SET_SRC_ADDRESS_BASE |
Sets the SRC_BASE variable used by the MOVE_DATA instruction to the contents of the register region. |
0x11 |
SET_DST_ADDRESS_BASE |
Sets the DST_BASE variable used by the MOVE_DATA instruction to the contents of the register region. |
0x12 |
MOVE_DATA |
Moves VAR2 bytes of data from SRC_BASE + Offset to DST_BASE + Offset, where Offset is the contents of the register region. |
The Flags field allows qualifying flags to be associated with the instruction. The following table identifies the flags that can be associated with Serialization Instructions.
Value |
Name |
Description |
---|---|---|
0x01 |
PRESERVE_REGISTER |
For WRITE_REGISTER and WRITE_REGISTER_VALUE instructions, this flag indicates that bits within the register that are not being written must be preserved rather than destroyed. For READ_REGISTER instructions, this flag is ignored. |
18.5.1.2.1. READ_REGISTER_VALUE¶
A read register value instruction reads the register region and compares the result with the specified value. If the values are not equal, the instruction failed. This can be described in pseudo code as follows:
X = Read(register)
X = X >> Bit Offset described in Register Region
X = X & Mask
If (X != Value) FAIL
SUCCEED
18.5.1.2.2. READ_REGISTER¶
A read register instruction reads the register region. The result is a generic value and should not be compared with Value. Value will be ignored. This can be described in pseudo code as follows:
X = Read(register)
X = X >> Bit Offset described in Register Region
X = X & Mask
Return X
18.5.1.2.3. WRITE_REGISTER_VALUE¶
A write register value instruction writes the specified value to the register region. If PRESERVE_REGISTER is set in Instruction Flags, then the bits not corresponding to the write value instruction are preserved. If the register is preserved, the write value instruction requires a read of the register. This can be described in pseudo code as follows:
X = Value & Mask
X = X << Bit Offset described in Register Region
If (Preserve Register)
Y = Read(register)
Y = Y & ~(Mask << Bit Offset)
X = X \| Y
Write(X, Register)
18.5.1.2.4. WRITE_REGISTER¶
A write register instruction writes a value to the register region. Value will be ignored. If PRESERVE_REGISTER is set in Instruction Flags, then the bits not corresponding to the write instruction are preserved. If the register is preserved, the write value instruction requires a read of the register. This can be described in pseudo code as follows:
X = supplied value
X = X & Mask
X = X << Bit Offset described in Register Region
If (Preserve Register)
Y = Read(register)
Y = Y & ~(Mask << Bit Offset)
X = X \| Y
Write(X, Register)
18.5.1.3. Error Record Serialization Information¶
The APEI error record includes an 8 byte field called OSPM Reserved. The following table defines the layout of this field. The error record serialization information is a small buffer the platform can use for serialization bookkeeping. The platform is free to use the 48 bits starting at bit offset 16 for its own purposes. It may use these bits to indicate the busy/free status of an error record, to record an internal identifier, etc.
Field |
Bit Length |
Bit Offset |
Description |
---|---|---|---|
Signature |
16 |
0 |
16-bit signature (‘ER’) identifying the start of the error record serialization data. |
Platform Serialization Data |
48 |
16 |
Platform private error record serialization information. |
18.5.2. Operations¶
The error record serialization interface comprises three operations: Write, Read, and Clear. OSPM uses the Write operation to write a single error record to the persistent store. The Read operation is used to retrieve a single error record previously recorded to the persistent store using the write operation. The Clear operation allows OSPM to notify the platform that a given error record has been fully processed and is no longer needed, allowing the platform to recover the storage associated with a cleared error record.
Where the Error Log Address Range is NVRAM, significant optimizations are possible since transfer from the Error Log Address Range to a separate storage device is unnecessary. The platform may still, however, copy the record from NVRAM to another device, should it choose to. This allows, for example, the platform to copy error records to private log files. In order to give the platform the opportunity to do this, OSPM must use the Write operation to persist error records even when the Error Log Address Range is NVRAM. The Read and Clear operations, however, are unnecessary in this case as OSPM is capable of reading and clearing error records without assistance from the platform.
18.5.2.1. Writing¶
To write a single HW error record, OSPM executes the following steps:
Initializes the error record’s serialization info. OSPM must fill in the Signature.
Writes the error record to be persisted into the Error Log Address Range.
Executes the BEGIN_WRITE_OPERATION action to notify the platform that a record write operation is beginning.
Executes the SET_RECORD_OFFSET action to inform the platform where in the
Error Log Address Range the error record resides.
Executes the EXECUTE_OPERATION action to instruct the platform to begin the write operation.
Busy waits by continually executing CHECK_BUSY_STATUS action until FALSE is returned.
Executes a GET_COMMAND_STATUS action to determine the status of the write operation. If an error is indicated, the OS
PM may retry the operation.
Executes an END_OPERATION action to notify the platform that the record write operation is complete.
When OSPM performs the EXECUTE_OPERATION action in the context of a record write operation, the platform attempts to transfer the error record from the designated offset in the Error Log Address Range to a persistent store of its choice. If the Error Log Address Range is non-volatile RAM, no transfer is required.
Where the platform is required to transfer the error record from the Error Log Address Range to a persistent store, it performs the following steps in response to receiving a write command:
Sets some internal state to indicate that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.
Reads the error record’s Record ID field to determine where on the storage medium the supplied error record is to be written. The platform attempts to locate the specified error record on the persistent store.
If the specified error record does not exist, the platform attempts to write a new record to the persistent store.
If the specified error record does exists, then if the existing error record is large enough to be overwritten by the supplied error record, the platform can do an in-place replacement. If the existing record is not large enough to be overwritten, the platform must attempt to locate space in which to write the new record. It may mark the existing record as Free and coalesce adjacent free records in order to create the necessary space.
Transfers the error record to the selected location on the persistent store.
Updates an internal Record Count if a new record was written.
Records the status of the operation so OSPM can retrieve the status by executing a GET_COMMAND_STATUS action.
Modifies internal busy state as necessary so when OS PM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.
If the Error Log Address Range resides in NVRAM, the minimum steps required of the platform are:
Sets some internal state to indication that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.
Records the status of the operation so OSPM can retrieve the status by executing a GET_COMMAND_STATUS action.
Clear internal busy state so when OS PM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.
18.5.2.2. Reading¶
During boot, OSPM attempts to retrieve all serialized error records from the persistent store. If the Error Log Address Range does not reside in NVRAM, the following steps are executed by OSPM to retrieve all error records:
Executes the BEGIN_ READ_OPERATION action to notify the platform that a record read operation is beginning.
Executes the SET_ RECORD_OFFSET action to inform the platform at what offset in the Error Log Address Range the error record is to be transferred.
Executes the SET_RECORD_IDENTIFER action to inform the platform which error record is to be read from its persistent store.
Executes the EXECUTE_OPERATION action to instruct the platform to begin the read operation.
Busy waits by continually executing CHECK_BUSY_STATUS action until FALSE is returned.
Executes a GET_COMMAND_STATUS action to determine the status of the read operation.
If the status is Record Store Empty (0x04), continue to step 7.
If an error occurred reading a valid error record, the status will be Failed (0x03), continue to step 7.
If the status is Record Not Found (0x05), indicating that the specified error record does not exist, OSPM retrieves a valid identifier by executing a GET_RECORD_IDENTIFIER action. The platform will return a valid record identifier.
If the status is Success, OSPM transfers the retrieved record from the Error Log Address Range to a private buffer and then executes the GET_RECORD_IDENTIFIER action to determine the identifier of the next record in the persistent store.
Execute an END_OPERATION to notify the platform that the record read operation is complete.
The steps performed by the platform to carry out a read request are as follows:
Sets some internal state to indicate that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.
Using the record identifier supplied by OSPM through the SET_RECORD_IDENTIFIER operation, determine which error record to read:
If the identifier is 0x0 (unspecified), the platform reads the ‘first’ error record from its persistent store (first being implementation specific).
If the identifier is non-zero, the platform attempts to locate the specified error record on the persistent store.
If the specified error record does not exist, set the status register’s Status to Record Not Found (0x05), and update the status register’s Identifier field with the identifier of the ‘first’ error record.
Transfer the record from the persistent store to the offset specified by OSPM from the base of the Error Log Address Range.
Record the Identifier of the ‘next’ valid error record that resides on the persistent store. This allows OSPM to retrieve a valid record identifier by executing a GET_RECORD_IDENTIFIER operation.
Record the status of the operation so OSPM can retrieve the status by executing a GET_COMMAND_STATUS action.
Clear internal busy state so when OSPM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.
Where the Error Log Address Range does reside in NVRAM, OSPM requires no platform support to read persisted error records. OSPM can scan the Error Log Address Range on its own and retrieve the error records it previously persisted.
18.5.2.3. Clearing¶
After OSPM has finished processing an error record, it will notify the platform by clearing the record. This allows the platform to delete the record from the persistent store or mark it such that the space is free and can be reused. The following steps are executed by OSPM to clear an error record:
Executes a BEGIN_ CLEAR_OPERATION action to notify the platform that a record clear operation is beginning.
Executes a SET_RECORD_IDENTIFER action to inform the platform which error record is to be cleared. This value must not be set to 0x0 (unspecified).
Executes an EXECUTE_OPERATION action to instruct the platform to begin the clear operation.
Busy waits by continually executing CHECK_BUSY_STATUS action until FALSE is returned.
Executes a GET_COMMAND_STATUS action to determine the status of the clear operation.
Execute an END_OPERATION to notify the platform that the record read operation is complete.
The platform carries out a clear request by performing the following steps:
Sets some internal state to indication that it is busy. OSPM polls by executing a CHECK_BUSY_STATUS action until the operation is completed.
Using the record identifier supplied by OSPM through the SET_RECORD_IDENTIFIER operation, determine which error record to clear. This value may not be 0x0 (unspecified).
Locate the specified error record on the persistent store.
Mark the record as free by updating the Attributes in its serialization header.
Update internal record count.
Clear internal busy state so when OS PM executes CHECK_BUSY_STATUS, the result indicates that the operation is complete.
When the Error Log Address Range resides in NVRAM, the OS requires no platform support to Clear error records.
18.5.2.4. Usage¶
This section describes several possible ways the error record serialization mechanism might be implemented.
18.5.2.4.1. Error Log Address Range Resides in NVRAM¶
If the Error Log Address Range resides in NVRAM, then when OSPM writes a record into the logging range, the record is automatically persistent and the busy bit can be cleared immediately. On a subsequent boot, OSPM can read any persisted error records directly from the persistent store range. The size of the persistent store, in this case, is expected to be enough for several error records.
18.5.2.4.2. Error Log Address Range Resides in (volatile) RAM¶
In this implementation, the Error Log Address Range describes an intermediate location for error records. To persist a record, OSPM copies the record into the Error Log Address Range and sets the Execute, at which time the platform runs necessary code (SMM code on non-UEFI based systems and UEFI runtime code on UEFI-enabled systems) to transfer the error record from main memory to some persistent store. To read a record, OSPM asks the platform to copy a record from the persistent store to a specified offset within the Error Log Address Range. The size of the Error Log Address Range is at least large enough for one error record.
18.5.2.4.3. Error Log Address Range Resides on Service Processor¶
In this type of implementation, the Error Log Address Range is really MMIO. When OSPM writes an error record to the Error Log Address Range, it is really writing to memory on a service processor. When the OSPM sets the Execute control bit, the platform knows that the OSPM is done writing the record and can do something with it, like move it into a permanent location (i.e. hard disk) on the service processor. The size of the persistent store in this type of implementation is typically large enough for one error record.
18.5.2.4.4. Error Log Address Range is Copied Across Network¶
In this type of implementation, the Error Log Address Range is an intermediate cache for error records. To persist an error record, OSPM copies the record into the Error Log Address Range and set the Execute control bit, and the platform runs code to transmit this error record over the wire. The size of the Error Log Address Range in this type of implementation is typically large enough for one error record.