18.6. Error Injection

This section outlines an ACPI table mechanism, called EINJ, which allows for a generic interface mechanism through which OSPM can inject hardware errors to the platform without requiring platform specific OSPM level software. The primary goal of this mechanism is to support testing of OSPM error handling stack by enabling the injection of hardware errors. Through this capability OSPM is able to implement a simple interface for diagnostic and validation of errors handling on the system.

18.6.1. Error Injection Table (EINJ)

The Error Injection (EINJ) table provides a generic interface mechanism through which OSPM can inject hardware errors to the platform without requiring platform specific OSPM software. System firmware is responsible for building this table, which is made up of Injection Instruction entries. The following table describes the necessary details for EINJ.

Table 18.23 Error Injection Table (EINJ)

Field

Byte length

Byte offset

Description

ACPI Standard Header

Header Signature

4

0

EINJ. Signature for the Error Record Injection Table.

Length

4

4

Length, in bytes, of entire EINJ. Entire table must be contiguous.

Revision

1

8

1

Checksum

1

9

Entire table must sum to zero.

OEMID

6

10

OEM ID.

OEM Table ID

8

16

The manufacturer model ID.

OEM Revision

4

24

OEM revision of EINJ.

Creator ID

4

28

Vendor ID of the utility that created the table.

Creator Revision

4

32

Revision of the utility that created the table.

Injection Header

Injection Header Size

4

36

Length in bytes of the Injection Interface header.

Injection Flags

1

40

Reserved. Must be zero

Reserved

3

41

Must be zero.

Injection Entry Count

4

44

The number of Instruction Entries in the Injection Action Table

Injection Action Table

Injection Instruction Entries

48

A series of error injection instruction entries, per Injection Entry Count See Table 18.25.

The following table identifies the supported error injection actions.

Table 18.24 Error Injection Actions

Value

Name

Description

0x0

BEGIN_INJECTION_OPERATION

Indicates to the platform that an error injection is beginning. This allows the platform to set its operational context.

0x1

GET_TRIGGER_ERROR_ACTION-_TABLE

Returns a 64-bit physical memory pointer to the Trigger Action Table. See Table 18.32

0x2

SET_ERROR_TYPE

Type of error to Inject. Only one ERROR_TYPE can be injected at any given time. If there is request for multiple injections at the same time, then the platform will return an error condition. See Section 18.6.4.

0x3

GET_ERROR_TYPE

Returns the error injection capabilities of the platform.

0x4

END_OPERATION

Indicates to the platform that the current injection operation has ended. This allows the platform to clear its operational context.

0x5

EXECUTE_OPERATION

Instructs the platform to carry out the current operation based on the current operational context.

0x6

CHECK_BUSY_STATUS

Returns the state of the current operation. Once an operation has been executed through the EXECUTE_OPERATION action, the platform is required to return an indication that the operation is busy until the operation is completed. This allows software to poll for completion by repeatedly executing the CHECK_BUSY_STATUS action until the platform indicates that the operation is complete by setting not busy. The lower most bit (bit0) of the returned value indicates the busy status by setting it to 1 and not busy status by setting it to 0.

0x7

GET_COMMAND_STATUS

Returns the status of the current operation. See Table 18.28 for a list of valid command status codes.

0x8

SET_ERROR_TYPE-_WITH_ADDRESS

Type of error to Inject, and the address to inject. Only one Error type can be injected at any given time. If there is request for multiple injections at the same time, then the platform will return an error condition.

The RegisterRegion field (See Table 18.25) in SET_ERROR_TYPE_WITH_ADDRESS points to a data structure whose format is defined in Table 18.30.

Note that executing SET_ERROR_TYPE_WITH_ADDRESS without specifying an address has the same effect as executing SET_ERROR_TYPE.

See Table 18.29, error type definition.

0x9

GET_EXECUTE_OPERATION-_TIMINGS

Returns an encoded QWORD : [63:32] value in microseconds that the platform expects would be the maximum amount of time it will take to process and complete an EXECUTE_OPERATION. [31:0] value in microseconds that the platform expects would be the nominal amount of time it will take to process and complete an EXECUTE_OPERATION.

0xFF

TRIGGER_ERROR

This Value is reserved for entries declared in the Trigger Error Action Table returned in response to a GET_TRIGGER_ERROR_ACTION_TABLE action. The returned table consists of a series of actions each of which is set to TRIGGER_ERROR (see Table 18.32). When executed by software, the series of TRIGGER_ERROR actions triggers the error injected as a result of the successful completion of an EXECUTE_OPERATION action.

18.6.2. Injection Instruction Entries

An Injection action consists of a series of one or more Injection Instructions. An Injection Instruction represents a primitive operation on an abstracted hardware register, represented by the register region as defined in an Injection Instruction Entry.

An Injection Instruction Entry describes a region in an injection hardware register and the injection instruction to be performed on that region.

The following table details the layout of an Injection Instruction Entry.

Table 18.25 Injection Instruction Entry

Field

Byte length

Byte offset

Description

Injection Action

1

0

The injection action that this instruction is a part of. See the Error Injection Actions table for supported injection actions.

Instruction

1

1

Identifies the instruction to execute. See the Injection Instructions table for a list of valid instructions.

Flags

1

2

Flags that qualify the instruction.

Reserved

1

3

Must be zero.

Register Region

12

4

The Generic Address Structure is used to describe the address and bit.

Address_Space_ID must be 0 (System Memory) or 1 (System IO). This constraint is an attempt to ensure that the registers are accessible in the presence of hardware error conditions.

Value

8

16

This is the value field that is used by the instruction READ or WRITE_REGISTER_VALUE.

Mask

8

24

The bit mask required to obtain the bits corresponding to the injection instruction in a given bit range defined by the register region.

Register Region is described as a generic address structure. This structure describes the physical address of a register as well as the bit range that corresponds to a desired region of the register. The bit range is defined as the smallest set of consecutive bits that contains every bit in the register that is associated with the injection Instruction. If bits [6:5] and bits [3:2] all correspond to an Injection Instruction, the bit range for that instruction would be [6:2].

Because a bit range could contain bits that do not pertain to a particular injection Instruction (i.e. bit 4 in the example above), a bit mask is required to distinguish all the bits in the region that correspond to the instruction. The Mask field is defined to be this bit mask with a bit set to a ‘1’ for each bit in the bit range (defined by the register region) corresponding to the Injection Instruction. Note that bit 0 of the bit mask corresponds to the lowest bit in the bit range. In the example used above, the mask would be 11011b or 0x1B.

Table 18.26 Instruction Flags

Value

Name

Description

0x01

PRESERVE_REGISTER

For WRITE_REGISTER and WRITE_REGISTER_VALUE instructions, this flag indicates that bits within the register that are not being written must be preserved rather than destroyed.

For READ_REGISTER instructions, this flag is ignored.

18.6.3. Injection Instructions

The table below lists the supported Injection Instructions for Injection Instruction Entries.

Table 18.27 Injection Instructions

Opcode

Instruction name

Description

0x00

READ_REGISTER

A READ_REGISTER instruction reads the value from the specified register region.

0x01

READ_REGISTER_VALUE

A READ_REGISTER_VALUE instruction reads the designated information from the specified Register Region and compares the results with the contents of the Value field.

If the information read matches the contents of the Value field, TRUE is returned, else FALSE is returned.

0x02

WRITE_REGISTER

A WRITE_REGISTER instruction writes a value chosen by software to the specified Register Region. The Value field is ignored.

0x03

WRITE_REGISTER_VALUE

A WRITE_REGISTER_VALUE instruction writes the contents of the Value field to the specified Register Region.

0x04

NOOP

No operation.

The table below defines the error injection status codes returned from GET_COMMAND_STATUS.

Table 18.28 Command Status Definition

Value

Description

0x0

Success

0x1

Unknown Failure

0x2

Invalid Access

18.6.4. Error Types

The table below defines the error type codes returned from GET_ERROR_TYPE, as well as the error type set by SET_ERROR_TYPE and the Error Type field set by SET_ERROR_TYPE_WITH_ADDRESS (see Table 18.30).

Both the SET_ERROR_TYPE and SET_ERROR_TYPE_WITH_ADDRESS actions must be present as part of the EINJ Action Table. OSPM is free to choose either of these two actions to inject an error type. The platform will give precedence to SET_ERROR_TYPE_WITH_ADDRESS. That is, if a non-zero Error Type value is set by SET_ERROR_TYPE_WITH_ADDRESS, then any Error Type value set by SET_ERROR_TYPE will be ignored. But if no Error Type is specified by SET_ERROR_TYPE_WITH_ADDRESS, then the platform will use SET_ERROR_TYPE to identify the error type to inject.

Table 18.29 Error Type Definition

Bit

Description

0

Processor Correctable

1

Processor Uncorrectable non-fatal

2

Processor Uncorrectable fatal

3

Memory Correctable

4

Memory Uncorrectable non-fatal

5

Memory Uncorrectable fatal

6

PCI Express Correctable

7

PCI Express Uncorrectable non-fatal

8

PCI Express Uncorrectable fatal

9

Platform Correctable

10

Platform Uncorrectable non-fatal

11

Platform Uncorrectable fatal

12:30

RESERVED

31

Vendor Defined Error Type. If this bit is set, then the Error types and related data structures are defined by the Vendor, as shown in the Vendor Error Type Extension Stucuture

Table 18.30 SET_ERROR_TYPE_WITH_ADDRESS Data Structure

Field

Byte Length

Byte Offset

Description

Error Type

4

0x0

Bit map of error types to inject. Refer Error Type Definition. This field is cleared by the platform once it is consumed.

Vendor Error Type Extension Structure Offset

4

4

Specifies the offset from the beginning of the table to the vendor error type extension structure. If no vendor error type extension is present, bit31 in error type must be clear and this field must be set to 0.

Flags

4

0x8

Bit [0] - Processor Identification Field Valid
Bit [1]- Memory Address and Memory address Mask Field Valid
Bit [2] - PCIe SBDF field valid
Bit [31:3] - RESERVED
This field is cleared by the platform once it is consumed.

Processor Error

Processor Identification

4

0x0C

Optional field: on non-ARM architectures, this is the physical APIC ID or the X2APIC ID of the processor which is a target for the injection; on ARM systems, this is the ACPI Processor UID value as used in the MADT.

Memory Error

Memory Address

8

0x10

Optional field specifying the physical address of the memory that is the target for the injection. Valid if Bit [1] of the Flags field is set.

Memory Address Range

8

0x18

Optional field that provides a range mask for the address field. Valid if Bit [1] of the Flags field is set. If the OSPM doesn’t want to provide a range of addresses, then this field should be zero.

PCIe SBDF

4

0x20

Byte 3 - PCIe Segment
Byte 2 - Bus Number
Byte 1:
Bits [7:3] Device Number
Bits [2:0] Function Number
Byte 0 - RESERVED
Table 18.31 Vendor Error Type Extension Structure

Field

Byte Length

Byte Offset

Attribute

Description

Length

4

0x0

Set by Platform. RO for Software.

Length, in bytes, of the entire Vendor Error Type Extension Structure.

SBDF

4

0x04

Set by Platform. RO for Software

This provides a PCIe Segment, Bus, Device and Function number which can be used to read the Vendor ID, Device ID and Rev ID, so that software can identify the system for error injection purposes. The platform sets this field and is RO for Software.

Vendor ID

2

0x08

Set by Platform. RO for Software

Vendor ID which identifies the device manufacturer. This is the same as the PCI SIG defined Vendor ID. The platform sets this field and is RO for Software

Device ID

2

0x0A

Set by Platform. RO for Software

This 16-bit ID is assigned by the manufacturer that identifies this device. The platform sets this field and is RO for Software

Rev ID

1

0x0C

Set by Platform. RO for Software

This 8-bit value is assigned by the manufacturer and identifies the revision number of the device. The platform sets this field and is RO for Software

Reserved

3

0x0D

Set by Platform. RO for Software

Reserved

OEM Defined structure

N

0x10

The rest of the fields are defined by the OEM.

18.6.5. Trigger Action Table

An error injection operation is a two-step process where the error is injected into the platform and subsequently triggered. After software injects an error into the platform using the EXECUTE_OPERATION action, it then needs to trigger the error. In order to trigger the error, software executes the GET_TRIGGER_ERROR_ACTION_TABLE action, which returns a pointer to a Trigger Error Action table. The format of this table is shown in the table below. Software then executes the instruction entries specified in the Trigger Error Action Table in order to trigger the injected error.

Table 18.32 Trigger Error Action

TRIGGER_ERROR Header

Byte Length

Byte Offset

Description

Header Size

4

0

Length in bytes of this header.

Revision

4

4

Table Size

4

8

Size in Bytes of the entire table.

Entry Count

4

12

The number of Instruction Entries in the TRIGGER_ERROR Action Sequence - see note (1) below.

Action Table

TRIGGER_ERROR Instruction Entries - see note (2) below

16

A series of error injection instruction entries as defined in Table 18-405.

Note

(1) If the “Entry Count” field above is ZERO, then there are no action structures in the TRIGGER_ERROR action table. The platform may make this field ZERO in situations where there is no need for a TRIGGER_ERROR action (for example, in cases where the error injection action seeds as well as consumes the error).

Note

(2) The format of TRIGGER_ERROR Instructions Entries is the same as Injection Instruction entries as described in Table 18-407.

18.6.6. Error Injection Operation

Before OSPM can use this mechanism to inject errors, it must discover the error injection capabilities of the platform by executing a GET_ERROR_TYPE. See Error Type Definition for a definition of error types.

After discovering the error injection capabilities, OSPM can inject and trigger an error according to the sequence described below.

Note that injecting an error into the platform does not automatically consume the error. In response to an error injection, the platform returns a trigger error action table. The software that injected the error must execute the actions in the trigger error action table in order to consume the error. If a specific error type is such that it is automatically consumed on injection, the platform will return a trigger error action table consisting of NO_OP.

  1. Executes a BEGIN_INJECTION_OPERATION action to notify the platform that an error injection operation is beginning.

  2. Executes a GET_ERROR_TYPE action to determine the error injection capabilities of the system. This action returns a DWORD bit map of the error types supported by the platform (see Table 18.29).

  3. If GET_ERROR_TYPE returns the DWORD with Bit [31] set, it means that vendor defined error types are present, apart from the standard error types (see Table 18.29).

  4. OSPM chooses the type of error to inject by executing a SET_ERROR_TYPE or a SET_ERROR_TYPE_WITH_ADDRESS _WITH_ADDRESS action (see Section 18.6.4).

    1. If the OSPM chooses to inject one of the supported standard error types, then it sets the corresponding bit in the error type bitmap. For example, if OSPM chooses to inject a “Memory Correctable” error, then the OSPM sets the value 0x0000_0080 in the error type bitmap.

    2. If the OSPM chooses to inject one of the vendor defined error types, then it sets bit[31] in the error type bitmap.

      * OSPM exectures the SET_ERROR_TYPE_WITH_ADDRESS_WITH_ADDRESS action to retrieve the location of the “SET_ERROR_TYPE_WITH_ADDRESS data structure”, to then get the location of the “Vendor Error Type Extension Structure” by reading the “Vendor Error Type Extension Structure Offset” (see Table 18.31).

      - OSPM reads the Vendor ID, Device ID and Rev ID from the PCI config space whose path (PCIe Segment/Device/Function) is provided in the “SBDF” field of the Vendor Error Type Extension Structure.

      - If the Vendor ID/Device ID and Rev IDs match, then the OSPM can identify the platform it is running on and would know the Vendor error types that are supported by this platform.

      - The OSPM writes the vendor error type to inject in the “OEM Defined Structure” field (see Table 18.31).

      * Optionally, for either standard or vendor-defined error types, the OSPM can choose the target of the injection, such as a memory range, PCIe Segment/Device/Function or Processor APIC ID, depending on the type of error. The OSPM does this by executing the SET_ERROR_TYPE_WITH_ADDRESS action to fill in the appropriate fields of the “SET_ERROR_TYPE_WITH_ADDRESS Data structure” (see Table 18.30).

  5. Executes an EXECUTE_OPERATION action to instruct the platform to begin the injection operation.

  6. Busy waits by continually executing CHECK_BUSY_STATUS action until the platform indicates that the operation is complete by clearing the abstracted Busy bit.

  7. Executes a GET_COMMAND_STATUS action to determine the status of the completed operation.

  8. If the status indicates that the platform cannot inject errors, stop.

  9. Executes a GET_TRIGGER_ERROR_ACTION_TABLE operation to get the physical pointer to the TRIGGER_ERROR action table. This provides the flexibility in systems where injecting an error is a two (or more) step process.

  10. Executes the actions specified in the TRIGGER_ERROR action table.

  11. Execute an END_OPERATION to notify the platform that the error injection operation is complete.