M. Formats — Language Codes and Language Code Arrays

This appendix lists the formats for language codes and language code arrays.

M.1. Specifying individual language codes

The preferred representation of a language code is done via an RFC 4646 language code identifier*.

Alias codes supported in addition to RFC 4646

Table M.1 Alias Codes Supported in Addition to RFC 4646

RFC string

Supported Alias String

zh-Hans

zh-chs

zh-Hant

zh-cht

An RFC 4646 language code is represented as a null-terminated ASCII string.

An RFC 4646 language string must be constructed according to the tag creation rules in section 2.3 of RFC 4646. For example, when constructing the primary language tag for a locale identifier, if a 2 character ISO 639-1 language code exists along with a 3 character ISO 639-2 language code, then the ISO 639-1 language code must be used. Further, if an ISO 639-1 tag does not exist, then the ISO 639-2/T (Terminology) tag must be for the primary locale before an ISO 639-2/B (Bibliographic) tag may be used. See RFC 4646 for a complete discussion of this topic.

To provide backwards compatibility with preexisting EFI 1.10 drivers, a UEFI platforms may support deprecated protocols which represent languages in the ISO 639-2 format. This includes the following protocols: UNICODE_COLLATION_INTERFACE , EFI_DRIVER_CONFIGURATION_PROTOCOL , EFI_DRIVER_DIAGNOSTICS_PROTOCOL , and EFI_COMPONENT_NAME_PROTOCOL . The deprecated LangCodes and Lang global variables may also be supported by a platform for backwards compatibility.

M.1.1. Specifying language code arrays:

Native RFC 4646 format array:

An array of RFC 4646 character codes is represented as a NULL terminated char8 array of RFC 4646 language code strings. Each of these strings is delimited by a semicolon (‘;’) character. For example, an array of US English and Traditional Chinese would be represented as the NULL-terminated string “en-us;zh-Hant”.