18.6. Error Injection¶
This section outlines an ACPI table mechanism, called EINJ, which allows for a generic interface mechanism through which OSPM can inject hardware errors to the platform without requiring platform specific OSPM level software. The primary goal of this mechanism is to support testing of OSPM error handling stack by enabling the injection of hardware errors. Through this capability OSPM is able to implement a simple interface for diagnostic and validation of errors handling on the system.
18.6.1. Error Injection Table (EINJ)¶
The Error Injection (EINJ) table provides a generic interface mechanism through which OSPM can inject hardware errors to the platform without requiring platform specific OSPM software. System firmware is responsible for building this table, which is made up of Injection Instruction entries. The following table describes the necessary details for EINJ.
Field |
Byte length |
Byte offset |
Description |
---|---|---|---|
ACPI Standard Header |
|||
Header Signature |
4 |
0 |
EINJ. Signature for the Error Record Injection Table. |
Length |
4 |
4 |
Length, in bytes, of entire EINJ. Entire table must be contiguous. |
Revision |
1 |
8 |
1 |
Checksum |
1 |
9 |
Entire table must sum to zero. |
OEMID |
6 |
10 |
OEM ID. |
OEM Table ID |
8 |
16 |
The manufacturer model ID. |
OEM Revision |
4 |
24 |
OEM revision of EINJ. |
Creator ID |
4 |
28 |
Vendor ID of the utility that created the table. |
Creator Revision |
4 |
32 |
Revision of the utility that created the table. |
Injection Header |
|||
Injection Header Size |
4 |
36 |
Length in bytes of the Injection Interface header. |
Injection Flags |
1 |
40 |
Reserved. Must be zero |
Reserved |
3 |
41 |
Must be zero. |
Injection Entry Count |
4 |
44 |
The number of Instruction Entries in the Injection Action Table |
Injection Action Table |
|||
Injection Instruction Entries |
48 |
A series of error injection instruction entries, per Injection Entry Count See Table 18.25. |
The following table identifies the supported error injection actions.
Value |
Name |
Description |
---|---|---|
0x0 |
BEGIN_INJECTION_OPERATION |
Indicates to the platform that an error injection is beginning. This allows the platform to set its operational context. |
0x1 |
GET_TRIGGER_ERROR_ACTION-_TABLE |
Returns a 64-bit physical memory pointer to the Trigger Action Table. See Table 18.32 |
0x2 |
SET_ERROR_TYPE |
Type of error to Inject. Only one ERROR_TYPE can be injected at any given time. If there is request for multiple injections at the same time, then the platform will return an error condition. See Section 18.6.4. |
0x3 |
GET_ERROR_TYPE |
Returns the error injection capabilities of the platform. |
0x4 |
END_OPERATION |
Indicates to the platform that the current injection operation has ended. This allows the platform to clear its operational context. |
0x5 |
EXECUTE_OPERATION |
Instructs the platform to carry out the current operation based on the current operational context. |
0x6 |
CHECK_BUSY_STATUS |
Returns the state of the current operation. Once an operation has been executed through the EXECUTE_OPERATION action, the platform is required to return an indication that the operation is busy until the operation is completed. This allows software to poll for completion by repeatedly executing the CHECK_BUSY_STATUS action until the platform indicates that the operation is complete by setting not busy. The lower most bit (bit0) of the returned value indicates the busy status by setting it to 1 and not busy status by setting it to 0. |
0x7 |
GET_COMMAND_STATUS |
Returns the status of the current operation. See Table 18.28 for a list of valid command status codes. |
0x8 |
SET_ERROR_TYPE-_WITH_ADDRESS |
Type of error to Inject, and the address to inject. Only one Error type can be injected at any given time. If there is request for multiple injections at the same time, then the platform will return an error condition. The RegisterRegion field (See Table 18.25) in SET_ERROR_TYPE_WITH_ADDRESS points to a data structure whose format is defined in Table 18.30. Note that executing SET_ERROR_TYPE_WITH_ADDRESS without specifying an address has the same effect as executing SET_ERROR_TYPE. See Table 18.29, error type definition. |
0x9 |
GET_EXECUTE_OPERATION-_TIMINGS |
Returns an encoded QWORD : [63:32] value in microseconds that the platform expects would be the maximum amount of time it will take to process and complete an EXECUTE_OPERATION. [31:0] value in microseconds that the platform expects would be the nominal amount of time it will take to process and complete an EXECUTE_OPERATION. |
0xFF |
TRIGGER_ERROR |
This Value is reserved for entries declared in the Trigger Error Action Table returned in response to a GET_TRIGGER_ERROR_ACTION_TABLE action. The returned table consists of a series of actions each of which is set to TRIGGER_ERROR (see Table 18.32). When executed by software, the series of TRIGGER_ERROR actions triggers the error injected as a result of the successful completion of an EXECUTE_OPERATION action. |
18.6.2. Injection Instruction Entries¶
An Injection action consists of a series of one or more Injection Instructions. An Injection Instruction represents a primitive operation on an abstracted hardware register, represented by the register region as defined in an Injection Instruction Entry.
An Injection Instruction Entry describes a region in an injection hardware register and the injection instruction to be performed on that region.
The following table details the layout of an Injection Instruction Entry.
Field |
Byte length |
Byte offset |
Description |
---|---|---|---|
Injection Action |
1 |
0 |
The injection action that this instruction is a part of. See the Error Injection Actions table for supported injection actions. |
Instruction |
1 |
1 |
Identifies the instruction to execute. See the Injection Instructions table for a list of valid instructions. |
Flags |
1 |
2 |
Flags that qualify the instruction. |
Reserved |
1 |
3 |
Must be zero. |
Register Region |
12 |
4 |
The Generic Address Structure is used to describe the address and bit. Address_Space_ID must be 0 (System Memory) or 1 (System IO). This constraint is an attempt to ensure that the registers are accessible in the presence of hardware error conditions. |
Value |
8 |
16 |
This is the value field that is used by the instruction READ or WRITE_REGISTER_VALUE. |
Mask |
8 |
24 |
The bit mask required to obtain the bits corresponding to the injection instruction in a given bit range defined by the register region. |
Register Region is described as a generic address structure. This structure describes the physical address of a register as well as the bit range that corresponds to a desired region of the register. The bit range is defined as the smallest set of consecutive bits that contains every bit in the register that is associated with the injection Instruction. If bits [6:5] and bits [3:2] all correspond to an Injection Instruction, the bit range for that instruction would be [6:2].
Because a bit range could contain bits that do not pertain to a particular injection Instruction (i.e. bit 4 in the example above), a bit mask is required to distinguish all the bits in the region that correspond to the instruction. The Mask field is defined to be this bit mask with a bit set to a ‘1’ for each bit in the bit range (defined by the register region) corresponding to the Injection Instruction. Note that bit 0 of the bit mask corresponds to the lowest bit in the bit range. In the example used above, the mask would be 11011b or 0x1B.
Value |
Name |
Description |
---|---|---|
0x01 |
PRESERVE_REGISTER |
For WRITE_REGISTER and WRITE_REGISTER_VALUE instructions, this flag indicates that bits within the register that are not being written must be preserved rather than destroyed. For READ_REGISTER instructions, this flag is ignored. |
18.6.3. Injection Instructions¶
The table below lists the supported Injection Instructions for Injection Instruction Entries.
Opcode |
Instruction name |
Description |
---|---|---|
0x00 |
READ_REGISTER |
A READ_REGISTER instruction reads the value from the specified register region. |
0x01 |
READ_REGISTER_VALUE |
A READ_REGISTER_VALUE instruction reads the designated information from the specified Register Region and compares the results with the contents of the Value field. If the information read matches the contents of the Value field, TRUE is returned, else FALSE is returned. |
0x02 |
WRITE_REGISTER |
A WRITE_REGISTER instruction writes a value chosen by software to the specified Register Region. The Value field is ignored. |
0x03 |
WRITE_REGISTER_VALUE |
A WRITE_REGISTER_VALUE instruction writes the contents of the Value field to the specified Register Region. |
0x04 |
NOOP |
No operation. |
The table below defines the error injection status codes returned from GET_COMMAND_STATUS.
Value |
Description |
---|---|
0x0 |
Success |
0x1 |
Unknown Failure |
0x2 |
Invalid Access |
18.6.4. Error Types¶
The table below defines the error type codes returned from GET_ERROR_TYPE, as well as the error type set by SET_ERROR_TYPE and the Error Type field set by SET_ERROR_TYPE_WITH_ADDRESS (see Table 18.30).
Both the SET_ERROR_TYPE and SET_ERROR_TYPE_WITH_ADDRESS actions must be present as part of the EINJ Action Table. OSPM is free to choose either of these two actions to inject an error type. The platform will give precedence to SET_ERROR_TYPE_WITH_ADDRESS. That is, if a non-zero Error Type value is set by SET_ERROR_TYPE_WITH_ADDRESS, then any Error Type value set by SET_ERROR_TYPE will be ignored. But if no Error Type is specified by SET_ERROR_TYPE_WITH_ADDRESS, then the platform will use SET_ERROR_TYPE to identify the error type to inject.
Bit |
Description |
---|---|
0 |
Processor Correctable |
1 |
Processor Uncorrectable non-fatal |
2 |
Processor Uncorrectable fatal |
3 |
Memory Correctable |
4 |
Memory Uncorrectable non-fatal |
5 |
Memory Uncorrectable fatal |
6 |
PCI Express Correctable |
7 |
PCI Express Uncorrectable non-fatal |
8 |
PCI Express Uncorrectable fatal |
9 |
Platform Correctable |
10 |
Platform Uncorrectable non-fatal |
11 |
Platform Uncorrectable fatal |
12:30 |
RESERVED |
31 |
Vendor Defined Error Type. If this bit is set, then the Error types and related data structures are defined by the Vendor, as shown in the Vendor Error Type Extension Stucuture |
Field |
Byte Length |
Byte Offset |
Description |
---|---|---|---|
Error Type |
4 |
0x0 |
Bit map of error types to inject. Refer Error Type Definition. This field is cleared by the platform once it is consumed. |
Vendor Error Type Extension Structure Offset |
4 |
4 |
Specifies the offset from the beginning of the table to the vendor error type extension structure. If no vendor error type extension is present, bit31 in error type must be clear and this field must be set to 0. |
Flags |
4 |
0x8 |
Bit [0] - Processor Identification Field Valid
Bit [1]- Memory Address and Memory address Mask Field Valid
Bit [2] - PCIe SBDF field valid
Bit [31:3] - RESERVED
This field is cleared by the platform once it is consumed.
|
Processor Error |
|||
Processor Identification |
4 |
0x0C |
Optional field: on non-ARM architectures, this is the physical APIC ID or the X2APIC ID of the processor which is a target for the injection; on ARM systems, this is the ACPI Processor UID value as used in the MADT. |
Memory Error |
|||
Memory Address |
8 |
0x10 |
Optional field specifying the physical address of the memory that is the target for the injection. Valid if Bit [1] of the Flags field is set. |
Memory Address Range |
8 |
0x18 |
Optional field that provides a range mask for the address field. Valid if Bit [1] of the Flags field is set. If the OSPM doesn’t want to provide a range of addresses, then this field should be zero. |
PCIe SBDF |
4 |
0x20 |
Byte 3 - PCIe Segment
Byte 2 - Bus Number
Byte 1:
Bits [7:3] Device Number
Bits [2:0] Function Number
Byte 0 - RESERVED
|
Field |
Byte Length |
Byte Offset |
Attribute |
Description |
---|---|---|---|---|
Length |
4 |
0x0 |
Set by Platform. RO for Software. |
Length, in bytes, of the entire Vendor Error Type Extension Structure. |
SBDF |
4 |
0x04 |
Set by Platform. RO for Software |
This provides a PCIe Segment, Bus, Device and Function number which can be used to read the Vendor ID, Device ID and Rev ID, so that software can identify the system for error injection purposes. The platform sets this field and is RO for Software. |
Vendor ID |
2 |
0x08 |
Set by Platform. RO for Software |
Vendor ID which identifies the device manufacturer. This is the same as the PCI SIG defined Vendor ID. The platform sets this field and is RO for Software |
Device ID |
2 |
0x0A |
Set by Platform. RO for Software |
This 16-bit ID is assigned by the manufacturer that identifies this device. The platform sets this field and is RO for Software |
Rev ID |
1 |
0x0C |
Set by Platform. RO for Software |
This 8-bit value is assigned by the manufacturer and identifies the revision number of the device. The platform sets this field and is RO for Software |
Reserved |
3 |
0x0D |
Set by Platform. RO for Software |
Reserved |
OEM Defined structure |
N |
0x10 |
The rest of the fields are defined by the OEM. |
18.6.5. Trigger Action Table¶
An error injection operation is a two-step process where the error is injected into the platform and subsequently triggered. After software injects an error into the platform using the EXECUTE_OPERATION action, it then needs to trigger the error. In order to trigger the error, software executes the GET_TRIGGER_ERROR_ACTION_TABLE action, which returns a pointer to a Trigger Error Action table. The format of this table is shown in the table below. Software then executes the instruction entries specified in the Trigger Error Action Table in order to trigger the injected error.
TRIGGER_ERROR Header |
Byte Length |
Byte Offset |
Description |
---|---|---|---|
Header Size |
4 |
0 |
Length in bytes of this header. |
Revision |
4 |
4 |
|
Table Size |
4 |
8 |
Size in Bytes of the entire table. |
Entry Count |
4 |
12 |
The number of Instruction Entries in the TRIGGER_ERROR Action Sequence - see note (1) below. |
Action Table |
|||
TRIGGER_ERROR Instruction Entries - see note (2) below |
16 |
A series of error injection instruction entries as defined in Table 18-405. |
Note
(1) If the “Entry Count” field above is ZERO, then there are no action structures in the TRIGGER_ERROR action table. The platform may make this field ZERO in situations where there is no need for a TRIGGER_ERROR action (for example, in cases where the error injection action seeds as well as consumes the error).
Note
(2) The format of TRIGGER_ERROR Instructions Entries is the same as Injection Instruction entries as described in Table 18-407.
18.6.6. Error Injection Operation¶
Before OSPM can use this mechanism to inject errors, it must discover the error injection capabilities of the platform by executing a GET_ERROR_TYPE. See Error Type Definition for a definition of error types.
After discovering the error injection capabilities, OSPM can inject and trigger an error according to the sequence described below.
Note that injecting an error into the platform does not automatically consume the error. In response to an error injection, the platform returns a trigger error action table. The software that injected the error must execute the actions in the trigger error action table in order to consume the error. If a specific error type is such that it is automatically consumed on injection, the platform will return a trigger error action table consisting of NO_OP.
Executes a BEGIN_INJECTION_OPERATION action to notify the platform that an error injection operation is beginning.
Executes a GET_ERROR_TYPE action to determine the error injection capabilities of the system. This action returns a DWORD bit map of the error types supported by the platform (see Table 18.29).
If GET_ERROR_TYPE returns the DWORD with Bit [31] set, it means that vendor defined error types are present, apart from the standard error types (see Table 18.29).
OSPM chooses the type of error to inject by executing a SET_ERROR_TYPE or a SET_ERROR_TYPE_WITH_ADDRESS _WITH_ADDRESS action (see Section 18.6.4).
If the OSPM chooses to inject one of the supported standard error types, then it sets the corresponding bit in the error type bitmap. For example, if OSPM chooses to inject a “Memory Correctable” error, then the OSPM sets the value 0x0000_0080 in the error type bitmap.
If the OSPM chooses to inject one of the vendor defined error types, then it sets bit[31] in the error type bitmap.
* OSPM exectures the SET_ERROR_TYPE_WITH_ADDRESS_WITH_ADDRESS action to retrieve the location of the “SET_ERROR_TYPE_WITH_ADDRESS data structure”, to then get the location of the “Vendor Error Type Extension Structure” by reading the “Vendor Error Type Extension Structure Offset” (see Table 18.31).
- OSPM reads the Vendor ID, Device ID and Rev ID from the PCI config space whose path (PCIe Segment/Device/Function) is provided in the “SBDF” field of the Vendor Error Type Extension Structure.
- If the Vendor ID/Device ID and Rev IDs match, then the OSPM can identify the platform it is running on and would know the Vendor error types that are supported by this platform.
- The OSPM writes the vendor error type to inject in the “OEM Defined Structure” field (see Table 18.31).
* Optionally, for either standard or vendor-defined error types, the OSPM can choose the target of the injection, such as a memory range, PCIe Segment/Device/Function or Processor APIC ID, depending on the type of error. The OSPM does this by executing the SET_ERROR_TYPE_WITH_ADDRESS action to fill in the appropriate fields of the “SET_ERROR_TYPE_WITH_ADDRESS Data structure” (see Table 18.30).
Executes an EXECUTE_OPERATION action to instruct the platform to begin the injection operation.
Busy waits by continually executing CHECK_BUSY_STATUS action until the platform indicates that the operation is complete by clearing the abstracted Busy bit.
Executes a GET_COMMAND_STATUS action to determine the status of the completed operation.
If the status indicates that the platform cannot inject errors, stop.
Executes a GET_TRIGGER_ERROR_ACTION_TABLE operation to get the physical pointer to the TRIGGER_ERROR action table. This provides the flexibility in systems where injecting an error is a two (or more) step process.
Executes the actions specified in the TRIGGER_ERROR action table.
Execute an END_OPERATION to notify the platform that the error injection operation is complete.