M. Formats — Language Codes and Language Code Arrays
This appendix lists the formats for language codes and language code arrays.
M.1. Specifying individual language codes
The preferred representation of a language code is done via an RFC 4646 language code identifier*.
Alias codes supported in addition to RFC 4646
RFC string |
Supported Alias String |
zh-Hans |
zh-chs |
zh-Hant |
zh-cht |
An RFC 4646 language code is represented as a null-terminated ASCII string.
An RFC 4646 language string must be constructed according to the tag creation rules in section 2.3 of RFC 4646. For example, when constructing the primary language tag for a locale identifier, if a 2 character ISO 639-1 language code exists along with a 3 character ISO 639-2 language code, then the ISO 639-1 language code must be used. Further, if an ISO 639-1 tag does not exist, then the ISO 639-2/T (Terminology) tag must be for the primary locale before an ISO 639-2/B (Bibliographic) tag may be used. See RFC 4646 for a complete discussion of this topic.
M.1.1. Specifying language code arrays:
Native RFC 4646 format array:
An array of RFC 4646 character codes is represented as a NULL terminated char8 array of RFC 4646 language code strings. Each of these strings is delimited by a semicolon (‘;’) character. For example, an array of US English and Traditional Chinese would be represented as the NULL-terminated string “en-us;zh-Hant”.