Kodi Community Forum

Full Version: EPG Genre Description Observations and Questions
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have been looking at the 'content identifier' (genre) description strings in the GB and AU language files.  For the most part, the string ID number is 19484 greater than the decimal equivalent of the ETSI code.

For example, Kodi #19556 minus 19484 equals 72 (0x48) which matches the ETSI code for "Water sport".

I hope that I am mistaken, but I think that I may have found a number of anomalies:
 
  • Kodi #19660 "Special characteristics" should read "Original language" (ETSI 0xB0)
  • Kodi #19661 "Original language" should read "Black & white" (ETSI 0xB1)
  • Kodi #19662 "Black & white" should read "Unpublished" (ETSI 0xB2)
  • Kodi #19663 "Unpublished" should read "Live broadcast" (ETSI 0xB3)
  • Kodi #19664 "Live broadcast" should read "Plano-stereoscopic" (ETSI 0xB4)
  • Kodi #19676 "Drama" should read "Adult (general) (ETSI 0xC0)"

Q1) Because these changes are to the master GB language file and it is more than simply fixing a typo, is this something an end-user like me can do or does it need to be reviewed by somebody more knowledgeable first?  I'm happy to create a Weblate account and make the required alterations.

Also, in the AU translation, there is a typo with Kodi #19601 "Litreature" should read "Literature".

The scheme described above all comes unstuck at Kodi #19685 onwards where some dialogue box strings seem to have been inserted.  Ideally Kodi #19499 to #19755 should have been reserved for content identifier descriptions within the master strings.po file.

Luckily, these incursions are limited to 3 functions:
 
  • PVRGUIActionsPowerManagement.cpp (#19685, #19690, #19691, #19692, #19693, #19694, #19695, #19696)
  • PVRContextMenus.cpp (#19686, #19687)
  • PVRGUIActionsPlayback.cpp (#19687)

Updating these modules and reassigning the 10 'msgctxt' values would be a trivial exercise if not for the 70+ translations that would also have to be updated.

These strings are in the ETSI range 0xC9 to 0xD4, which are currently designated as 'reserved for future use' so the change is currently not urgent.

Q2) Is Weblate capable of automatically assigning existing strings to new numbers, or will these have to be entered manually into GB and then propagated to other languages?

Q3) Is the effort to change the string IDs now worth it considering that this range may never actually be used by ETSI in the future?

At the very least, a note should be made in the master strings.po file to reserve the rest of the range and prevent further incursions.

ETSI content identifier descriptions can be found in Table 29 of the following document:

https://www.etsi.org/deliver/etsi_en/300...11701a.pdf
Maybe @Gade might have some comment here
I think you are overthinking this a bit. The string numbers in the PO files arent tied to anything. Any coincidence you think you are seeing is more likely that when the usage of said strings were implemented, that was an available block of numbers the dev picked.

If you look at the code that actually uses the string, for eg 19660

https://github.com/xbmc/xbmc/blob/56b88b...h.cpp#L136

Its quite clear the intent of the string usage is for EPG_EVENT_CONTENTMASK_SPECIAL and therefore your "Original Langauge" assumption around ETSI names is incorrect.
Thank you for your explanation @Fuzzard, however, I respectfully disagree.  I have done further analysis and I believe that I have uncovered the root cause of the misalignment.

As background info: the ETSI content_descriptor is an 8 bit field.  The first nibble denotes the generic category group, for example '0x40' indicates 'Sport'.  The second nibble denotes the specific category, for example '0x49' denotes 'Winter Sports'.

The constants that you highlighted in your example are defined in the following file '/xbmc/addons/kodi-dev-kit/include/kodi/c-api/addon-instance/pvr/pvr_epg.h'.
 
Code:
EPG_EVENT_CONTENTMASK_UNDEFINED = 0x00,
EPG_EVENT_CONTENTMASK_MOVIEDRAMA = 0x10,
EPG_EVENT_CONTENTMASK_NEWSCURRENTAFFAIRS = 0x20,
EPG_EVENT_CONTENTMASK_SHOW = 0x30,
EPG_EVENT_CONTENTMASK_SPORTS = 0x40,
EPG_EVENT_CONTENTMASK_CHILDRENYOUTH = 0x50,
EPG_EVENT_CONTENTMASK_MUSICBALLETDANCE = 0x60,
EPG_EVENT_CONTENTMASK_ARTSCULTURE = 0x70,
EPG_EVENT_CONTENTMASK_SOCIALPOLITICALECONOMICS = 0x80,
EPG_EVENT_CONTENTMASK_EDUCATIONALSCIENCE = 0x90,
EPG_EVENT_CONTENTMASK_LEISUREHOBBIES = 0xA0,
EPG_EVENT_CONTENTMASK_SPECIAL = 0xB0,
EPG_EVENT_CONTENTMASK_USERDEFINED = 0xF0,
....
EPG_EVENT_CONTENTSUBMASK_SPORTS_WINTERSPORTS = 0x9,

With the exception of EPG_EVENT_CONTENTMASK_SPECIAL the string IDs of all of these constants align with the ETSI content identifier plus 19484.  Furthermore, when the constants describing the specific categories (EPG_EVENT_CONTENTSUBMASK_********) are taken into account, they too align with the ETSI description values plus 19484.  There are even gaps of unused Kodi language strings where no ETSI code is defined.

Being able to filter on the generic category is very useful.  There is no category for 'Cricket', for example.  So a user would be best to search by the generic category '0x40 - Sport'.  However, a tennis fan can search specifically for '0x44 - Tennis/Squash'.

Upon further investigation, I discovered that the root cause is partially with the ETSI standard and partially with how the programmer has dealt with a shortfalling in the ETSI standard.

In the ETSI standard, for every category group except 'Special characteristics', the first detailed category is always a generic title as in the previous 'Sports' example.  With the 'Special' category, however, the first code is a detailed 'Original Language' description and not a generic 'Special' description.  There is no generic 'Special' label designated for the category group in the ETSI standard.

I assume that when the programmer was looking for labels to use when performing a generic genre search, there was no label for 'Special' so it was just added into the sequence.  This had the effect of pushing subsequent labels out of alignment with their ETSI counterparts as well as excluding the 'Original Language' detailed description.
 
Code:
EPG_EVENT_CONTENTSUBMASK_SPECIAL_GENERAL = 0x0, (This should be for 'Original Language')
EPG_EVENT_CONTENTSUBMASK_SPECIAL_BLACKANDWHITE = 0x1, (This is correct)
EPG_EVENT_CONTENTSUBMASK_SPECIAL_UNPUBLISHED = 0x2, (This is correct)
EPG_EVENT_CONTENTSUBMASK_SPECIAL_LIVEBROADCAST = 0x3, (This is correct)
EPG_EVENT_CONTENTSUBMASK_SPECIAL_PLANOSTEREOSCOPIC = 0x4, (This is correct)
EPG_EVENT_CONTENTSUBMASK_SPECIAL_LOCALORREGIONAL = 0x5, (This is correct)
EPG_EVENT_CONTENTSUBMASK_SPECIAL_USERDEFINED = 0xF (This is correct)

In hindsight, the 'Special' Kodi label should have been allocated a string number beyond the range of the rest of the ETSI descriptions.  In my opinion, this is what should be done and the other labels should be brought into line.

None of this, unfortunately, addresses the random dialogue box strings that have been inserted into the block at the tail end of the area apparently reserved for the ETSI sequence.
Again, the label numbers in the PO are NOT tied to ANYTHING. Forget the whole +19484. A string in the PO has a number. That number is referenced in the code to display that string. It has no understanding or knowledge about anything other than number = string displayed.

Your etsi stuff in the actual enums, could very well be an issue, but again, its NOT tied to anything with the po being etsi + 19484.
I do understand there is no actual link between the PO numbers and the ETSI codes.  The numbers by themselves have no significance to each other.  ETSI sets their numbers and Team Kodi sets theirs.  This does not, however, preclude Kodi developers from making design decisions to somehow align the two.

The ETSI content identifiers can be thought of as a 8 bit unsigned integer storing the values from 0 to 255 (inclusive).  Each of these values represents either an in-use genre description or a reservation for future use.

When it came time to add the ETSI descriptions to the PO, the developer doing so may have made a design decision to select a free block of 256 sequential PO numbers to store the relevant ETSI descriptions and also allow for future expansion.  There was no magic, just a design decision.

Such a design decision regarding the location of the ETSI codes within the PO would facilitate an elegant lookup function such as:
 
Code:
getGenreName(etsi_genre)
{
  return getString(etsi_genre + offset)
}

If the ETSI description strings were just assigned to random PO numbers, the then function would be full of convoluted logic that may look something like:
 
Code:
getGenreName(etsi_genre)
{
  switch(etsi_genre)
  {
    case 0x23: return getString(abcde); break
    case 0x51: return getString(plmgt); break
    case 0xaa: return getString(svteq); break
    case 0x0b: return getString(xnwly); break
    etc, until all assigned genres are accounted for
    default: return "Genre Unknown"
  }
}

There are close to 90 genre description strings in the PO.  With the exception of the few that I have identified, all of these genre PO strings align exactly with ETSI+Offset down to the gaps where no ETSI description is defined but may be added to in future if required.

I admit that this apparent alignment could be 100% coincidental.  However, this smells to me like a pragmatic design decision taken in the dim dark past whereby a block of sequential PO numbers were reserved/allocated for the ETSI descriptions.  If this was the case, then that sequence has been disrupted when it reaches 'Special'.

Upon further searching through the source code, I found almost exactly what I was looking for:

/xbmc/pvr/epg/Epg.cpp -> CPVREpg::ConvertGenreIdToString()

It does not operate on an offset to the entire ETSI code as I originally hypothesised.  It operates on an offset adding the ETSI sub-category nibble to the category string PO number.

All of those category PO numbers just also happen to be 16 apart.

Working through an example:  The ETSI code for 'Winter sport' is 0x49.

EPG_EVENT_CONTENTMASK_SPORTS = 0x40,

The 'case' code block for 'EPG_EVENT_CONTENTMASK_SPORTS' sets our starting PO as 19548.  Adding 9 to 19548 gives us 19557.  PO #19557 = 'Winter sports' giving us a perfect match.  Rinse and repeat.

Zooming in on the code handling 'Special'.
 
Code:
   case EPG_EVENT_CONTENTMASK_SPECIAL:
      iLabelId = (iSubID <= 3) ? 19660 + iSubID : 19660;
      break;

EPG_EVENT_CONTENTMASK_SPECIAL = 0xB0.

The ETSI 'Original Language' code is 0xB0.
(19660 + 0) = PO #19660 = 'Special characteristics'

The ETSI 'Black & white' code is 0xB1.
(19660 + 1) = PO #19661 = 'Original language'

The ETSI 'Unpublished' code is 0xB2.
(19660 + 2) = PO #19662 = 'Black & white'

If nothing else, the ConvertGenreIdToString() function probably does not return the correct string for this category and needs to be remedied.  Also the iSubID test needs to be increased to 5 as this category has been expanded since the code was written.

I think that the best fix would be to set the existing PO #19660 to 'Original Language' and bring the rest of those sub-category descriptions descriptions into line.  The string for 'Special characteristics' can be allocated to '<somewhere_else>' outside of the range.  CGUIDialogPVRGuideSearch::UpdateGenreSpin() would also need to be updated with the new PO for 'Special'.
 
Code:
   case EPG_EVENT_CONTENTMASK_SPECIAL:
      iLabelId = (iSubID <= 5) ? 19660 + iSubID : <somewhere_else>;
      break;

I admit that the following code would also fix the issue:
 
Code:
   case EPG_EVENT_CONTENTMASK_SPECIAL:
      iLabelId = (iSubID <= 5) ? 19661 + iSubID : 19660;
      break;

However, it breaks the contiguous sequence for the entire suite of genre descriptions.

I have never used GitHub before and I'm still researching how to get it working so that I can make code contributions.  I realise that this is a low-priority item, but I will happily take it on once I get up and running on GitHub.  It's a small task that could help me become accustomed to the Team Kodi development workflow.

I would also be happy to tackle the previously mentioned dialogue box intrusions.
(2023-03-07, 05:14)DeltaMikeCharlie Wrote: [ -> ]I admit that the following code would also fix the issue:
 
Code:
   case EPG_EVENT_CONTENTMASK_SPECIAL:
      iLabelId = (iSubID <= 5) ? 19661 + iSubID : 19660;
      break;
No, this would not fix the issue.  There would still be no way to make 0xB0 the correct description.