M. Formats — Language Codes and Language Code Arrays

This appendix lists the formats for language codes and language code arrays.

M.1. Specifying individual language codes

The preferred representation of a language code is done via an RFC 4646 language code identifier*.

Alias codes supported in addition to RFC 4646

Table M.1 Alias Codes Supported in Addition to RFC 4646

RFC string

Supported Alias String

zh-Hans

zh-chs

zh-Hant

zh-cht

An RFC 4646 language code is represented as a null-terminated ASCII string.

An RFC 4646 language string must be constructed according to the tag creation rules in section 2.3 of RFC 4646. For example, when constructing the primary language tag for a locale identifier, if a 2 character ISO 639-1 language code exists along with a 3 character ISO 639-2 language code, then the ISO 639-1 language code must be used. Further, if an ISO 639-1 tag does not exist, then the ISO 639-2/T (Terminology) tag must be for the primary locale before an ISO 639-2/B (Bibliographic) tag may be used. See RFC 4646 for a complete discussion of this topic.

M.1.1. Specifying language code arrays:

Native RFC 4646 format array:

An array of RFC 4646 character codes is represented as a NULL terminated char8 array of RFC 4646 language code strings. Each of these strings is delimited by a semicolon (‘;’) character. For example, an array of US English and Traditional Chinese would be represented as the NULL-terminated string “en-us;zh-Hant”.