2023-03-20, 21:48
The following description refers to v21.
Problem Statement
Under certain conditions, the 'xbmc.getLanguage()' function returns incorrect values.
Details
The function 'xbmc.getLanguage()' can be used in an addon to obtain 6 formatting options related to Kodi's current language configuration: ENGLISH_NAME, ISO_639_1 and ISO_639_2, either with or without additional regional information.
ENGLISH_NAME (Unformatted string)
The language name returned comes from the 'name' property in the active language addon's addon.xml file.
If the region is also required, it is supplied from the 'name' property of the selected 'region' node in the language addon's langinfo.xml file. {Issue #1}
ISO_639_1 (Two lower-case character ISO language code. eg: en, de, fr, ja, etc)
The language code is derived by performing a lookup on the 'English Name' of the language that returns the required 2 character code. {Issue #2}
If the region code is also required, it is obtained by using the 'locale' property for that region and performing a lookup against an ISO-639-1 table. {Issue #3}
ISO_639_2 (Three lower-case character ISO language code. eg: eng, deu [or ger], fra [or fre], jpn, etc)
The language code is derived by performing a lookup on the 'name' of the language that returns the required 3 character code. {Issue #4}
If the region code is also required, it is obtained by using the 'locale' property for that region and performing a lookup against an ISO 639-2B table. {Issue #5}
Issues
Issue #1
Although mostly cosmetic, the returned language/region string will look something like "English (Australian)-Australia (24h)" instead of "English-Australia".
Proposed solution: The 'English Name' for the language should be based on a lookup of the 'locale' property [already an ISO 639-1 2 character language code] of the 'language' node in the langinfo.xml file.
Issue #2
Using the name of the language as defined in the 'name' property of the language (as provided in the addon.xml) results in a mismatch when extra details are added to a language, eg: 'English' will match whereas 'English (New Zealand)' will not match.
Proposed solution: This option should return the 2 character ISO-639-1 language code that is already present in the langinfo.xml file.
Issue #3
The 'locale' property for the region actually appears to be an 'ISO 3166-1 alpha-2' country code because it contains two UPPER-CASE characters, unlike ISO-639-1 language code, which is specifically lower-case.
Sometimes a case-blind match will almost work, for example, country code 'DE' (Germany) will match language code 'de' and return 'German' rather than 'Germany'.
Unfortunately, region code 'CA' (Canada) will match the language code 'ca' which is 'Catalan, Valencian' and return the 3 character code 'cat'.
Proposed solution: A new lookup table needs to be introduced to return a country name from the provided 'ISO 3166-1 alpha-2' country code.
Issue #4
Similar to Issue #2.
Proposed solution: This option should return the 3 character ISO-639-2 language code based on a lookup of the 2 character ISO-639-1 language code already present in the langinfo.xml file.
Issue #5
If a 3 character region code is required (to match the 3 character language code), then 'ISO 3166-1 alpha-3' would appear to be the most relevant code to use.
Proposed solution: A lookup of an 'ISO 3166-1 alpha-3' country code based on the 'ISO 3166-1 alpha-2' country code would provide the desired result.
Findings
A number of functions already exist within Kodi to return some of the required values:
The existing function 'CLangInfo.GetLanguageCode()' already returns the 3 character ISO-639-2 language code ('eng').
The existing function 'CLangCodeExpander.ConvertToISO6391()' can be used to convert the 3 character ISO-639-2 language code to the 2 character ISO-639-1 language code ('eng' -> 'en').
The existing function 'CLangCodeExpander.Lookup()' can be used to obtain the language name (in English) from the 3 character ISO-639-2 language code ('eng' -> 'English').
Recommendation
Return values for the 'ENGLISH_NAME' formatting option should remain unchanged for backwards-compatibility reasons. Addons may already exist that expect and compensate for the erroneous return values.
A new function needs to be created to perform ISO 3166-1 country code lookups. Options are required to return the full name or the 3 character code from the 2 character code available from the region node in the langinfo.xml.
A new formatting option, ISO_NAME, should be introduced to return the language name and optionally the country name based on the ISO codes already present in the langinfo.xml file.
Conclusion
I have experimented with the proposed changes on a development fork and they appear to work as expected.
Part of these experiments involve producing a new function 'CLangCodeExpander.LookupISO31661()' that provides lookups between ISO 3166-1 2 character code, 3 character code and full name values. ('IE' vs 'IRL' vs 'Ireland')
I would appreciate feedback on my findings/recommendations before finalising the changes and submitting them for inclusion in the next release.
Problem Statement
Under certain conditions, the 'xbmc.getLanguage()' function returns incorrect values.
Details
The function 'xbmc.getLanguage()' can be used in an addon to obtain 6 formatting options related to Kodi's current language configuration: ENGLISH_NAME, ISO_639_1 and ISO_639_2, either with or without additional regional information.
ENGLISH_NAME (Unformatted string)
The language name returned comes from the 'name' property in the active language addon's addon.xml file.
If the region is also required, it is supplied from the 'name' property of the selected 'region' node in the language addon's langinfo.xml file. {Issue #1}
ISO_639_1 (Two lower-case character ISO language code. eg: en, de, fr, ja, etc)
The language code is derived by performing a lookup on the 'English Name' of the language that returns the required 2 character code. {Issue #2}
If the region code is also required, it is obtained by using the 'locale' property for that region and performing a lookup against an ISO-639-1 table. {Issue #3}
ISO_639_2 (Three lower-case character ISO language code. eg: eng, deu [or ger], fra [or fre], jpn, etc)
The language code is derived by performing a lookup on the 'name' of the language that returns the required 3 character code. {Issue #4}
If the region code is also required, it is obtained by using the 'locale' property for that region and performing a lookup against an ISO 639-2B table. {Issue #5}
Issues
Issue #1
Although mostly cosmetic, the returned language/region string will look something like "English (Australian)-Australia (24h)" instead of "English-Australia".
Proposed solution: The 'English Name' for the language should be based on a lookup of the 'locale' property [already an ISO 639-1 2 character language code] of the 'language' node in the langinfo.xml file.
Issue #2
Using the name of the language as defined in the 'name' property of the language (as provided in the addon.xml) results in a mismatch when extra details are added to a language, eg: 'English' will match whereas 'English (New Zealand)' will not match.
Proposed solution: This option should return the 2 character ISO-639-1 language code that is already present in the langinfo.xml file.
Issue #3
The 'locale' property for the region actually appears to be an 'ISO 3166-1 alpha-2' country code because it contains two UPPER-CASE characters, unlike ISO-639-1 language code, which is specifically lower-case.
Sometimes a case-blind match will almost work, for example, country code 'DE' (Germany) will match language code 'de' and return 'German' rather than 'Germany'.
Unfortunately, region code 'CA' (Canada) will match the language code 'ca' which is 'Catalan, Valencian' and return the 3 character code 'cat'.
Proposed solution: A new lookup table needs to be introduced to return a country name from the provided 'ISO 3166-1 alpha-2' country code.
Issue #4
Similar to Issue #2.
Proposed solution: This option should return the 3 character ISO-639-2 language code based on a lookup of the 2 character ISO-639-1 language code already present in the langinfo.xml file.
Issue #5
If a 3 character region code is required (to match the 3 character language code), then 'ISO 3166-1 alpha-3' would appear to be the most relevant code to use.
Proposed solution: A lookup of an 'ISO 3166-1 alpha-3' country code based on the 'ISO 3166-1 alpha-2' country code would provide the desired result.
Findings
A number of functions already exist within Kodi to return some of the required values:
The existing function 'CLangInfo.GetLanguageCode()' already returns the 3 character ISO-639-2 language code ('eng').
The existing function 'CLangCodeExpander.ConvertToISO6391()' can be used to convert the 3 character ISO-639-2 language code to the 2 character ISO-639-1 language code ('eng' -> 'en').
The existing function 'CLangCodeExpander.Lookup()' can be used to obtain the language name (in English) from the 3 character ISO-639-2 language code ('eng' -> 'English').
Recommendation
Return values for the 'ENGLISH_NAME' formatting option should remain unchanged for backwards-compatibility reasons. Addons may already exist that expect and compensate for the erroneous return values.
A new function needs to be created to perform ISO 3166-1 country code lookups. Options are required to return the full name or the 3 character code from the 2 character code available from the region node in the langinfo.xml.
A new formatting option, ISO_NAME, should be introduced to return the language name and optionally the country name based on the ISO codes already present in the langinfo.xml file.
Conclusion
I have experimented with the proposed changes on a development fork and they appear to work as expected.
Part of these experiments involve producing a new function 'CLangCodeExpander.LookupISO31661()' that provides lookups between ISO 3166-1 2 character code, 3 character code and full name values. ('IE' vs 'IRL' vs 'Ireland')
I would appreciate feedback on my findings/recommendations before finalising the changes and submitting them for inclusion in the next release.