tv show matching regex
#1
Hi

i'm trying to add the Kodi tv show matching system to Ember Media Manager.

I have analyzed the instructions in Wiki and source code of Kodi. I think I understand most of it.
Nevertheless, I have some questions:

1) Wiki AdvancedSettings tvshowmatching:
looks like the Defaults in Wiki are not up to date (?), e.g:

Wiki:
Code:
[Ss]([0-9]+)[][ ._-]*[Ee]([0-9]+)([^\\/]*)$

Source code (AdvancedSettings.cpp, line 231):
Code:
s([0-9]+)[ ._-]*e([0-9]+(?:(?:[a-i]|\\.[1-9])(?![0-9]))?)([^\\\\/]*)$


2) Differences between source code and log
Is it correct, that the log returns all regexes "escaped"?

First line: source code
Second line: log

Code:
s([0-9]+)[ ._-]*e([0-9]+(?:(?:[a-i]|\\.[1-9])(?![0-9]))?)([^\\\\/]*)$
s([0-9]+)[ ._-]*e([0-9]+(?:(?:[a-i]|\.[1-9])(?![0-9]))?)([^\\/]*)$

Code:
[\\._ -]()e(?:p[ ._-]?)?([0-9]+(?:(?:[a-i]|\\.[1-9])(?![0-9]))?)([^\\\\/]*)$
[\._ -]()e(?:p[ ._-]?)?([0-9]+(?:(?:[a-i]|\.[1-9])(?![0-9]))?)([^\\/]*)$

Code:
([0-9]{4})[\\.-]([0-9]{2})[\\.-]([0-9]{2})
([0-9]{4})[\.-]([0-9]{2})[\.-]([0-9]{2})

Code:
([0-9]{2})[\\.-]([0-9]{2})[\\.-]([0-9]{4})
([0-9]{2})[\.-]([0-9]{2})[\.-]([0-9]{4})

Code:
[\\\\/\\._ \\[\\(-]([0-9]+)x([0-9]+(?:(?:[a-i]|\\.[1-9])(?![0-9]))?)([^\\\\/]*)$
[\\/\._ \[\(-]([0-9]+)x([0-9]+(?:(?:[a-i]|\.[1-9])(?![0-9]))?)([^\\/]*)$

Code:
[\\\\/\\._ -]([0-9]+)([0-9][0-9](?:(?:[a-i]|\\.[1-9])(?![0-9]))?)([\\._ -][^\\\\/]*)$
[\\/\._ -]([0-9]+)([0-9][0-9](?:(?:[a-i]|\.[1-9])(?![0-9]))?)([\._ -][^\\/]*)$

Code:
[\\/._ -]p(?:ar)?t[_. -]()([ivx]+|[0-9]+)([._ -][^\\/]*)$
[\/._ -]p(?:ar)?t[_. -]()([ivx]+|[0-9]+)([._ -][^\/]*)$


3) Adding new/additional regex to AdvancedSettings.xml
If i want to add a new regex to AdvancedSettings.xml, which "style" i have to use? Same style like in source code or the style of logged regex?


4) Characters that make no sense (IMO)
There are many characters in regex, I do not understand why.

Code:
s([0-9]+)[ ._-]*e([0-9]+(?:(?:[a-i]|\\.[1-9])(?![0-9]))?)([^\\\\/]*)$

Why we need the escaped "\" between [a-i] and .[1-9]?
IMO it do not make sense to support "D:\TV Shows\Futurama\Futurama - s01e02\1.mkv
Or is this only to escape the dot?
Also the grouping for split-episode like "s01e02.1" does not work with this regex: Link
But it works with "s01e02a": Link
Looks like "02a" is grouped correctly as group 2, but the group 2 is only "02" if you use "s01e02.1". IMO it should be "02.1"

Why we need [^\\\\/]?
In log it is [^\\/], that means NOT "\" and NOT "\" and NOT "/". Make this sense? I think in [ ] we don't need more then one "\"...
Why we don't use [^\\\/] in source code? First "\" to escape the second "\", the third "\" to escape the "/".

Or need the regex system in C++ two "\" to escape a special character? Or only two "\" if you want escape the character "\" ? Looks like the regex in vb.NET is a little bit different.


5) Episode by Date
the Wiki says:
Quote:By date
Common for long-running daily shows, you can also use the date the episode aired.
anything_1996.11.14.ext (3)
anything_1996-11-14.ext (3)
anything_14.11.1996.ext (4)

Third example looks like european style DD.mm.yyyy, but it does not work. The log says.
Code:
12:58:57 T:11008   DEBUG: VideoInfoScanner: Found episode match D:\Ember Test\Serien\Die Schwarzwaldklinik\Season 01\Die Schwarzwaldklinik 23-11-1985.avi (s19e85) [[\\/\._ -]([0-9]+)([0-9][0-9](?:(?:[a-i]|\.[1-9])(?![0-9]))?)([\._ -][^\\/]*)$]
or
Code:
13:01:29 T:2236   DEBUG: VideoInfoScanner: Found episode match D:\Ember Test\Serien\Die Schwarzwaldklinik\Season 01\Die Schwarzwaldklinik 23.11.1985.avi (s19e85) [[\\/\._ -]([0-9]+)([0-9][0-9](?:(?:[a-i]|\.[1-9])(?![0-9]))?)([\._ -][^\\/]*)$]

Also I don't have found any support for european style in source code (VideoInfoScanner.cpp, line 1106):
Code:
bool CVideoInfoScanner::GetAirDateFromRegExp(CRegExp &reg, EPISODE &episodeInfo)
  {
    std::string param1(reg.GetMatch(1));
    std::string param2(reg.GetMatch(2));
    std::string param3(reg.GetMatch(3));

    if (!param1.empty() && !param2.empty() && !param3.empty())
    {
      // regular expression by date
      int len1 = param1.size();
      int len2 = param2.size();
      int len3 = param3.size();

      if (len1==4 && len2==2 && len3==2)
      {
        // yyyy mm dd format
        episodeInfo.cDate.SetDate(atoi(param1.c_str()), atoi(param2.c_str()), atoi(param3.c_str()));
      }
      else if (len1==2 && len2==2 && len3==4)
      {
        // mm dd yyyy format
        episodeInfo.cDate.SetDate(atoi(param3.c_str()), atoi(param1.c_str()), atoi(param2.c_str()));
      }
    }
    return episodeInfo.cDate.IsValid();
  }



Sorry for my bad english, I hope you understand me anyway Big Grin
Reply
#2
1. Trust the source
2. Additional escaping is needed since it is a c string.
3. You specify according to the format. XML need some stuff escaped, storing in a c string some.the regex issued is just the same.
4. \ is reserved in regex. Strictly speaking no escaping is needed in char lists as you poont out.
5. You are expected to reorder groups in the regex.
Reply
#3
Hmmm, ok.

I'm still not fully understand some regex.

Code:
[\\._ -]()e(?:p[ ._-]?)?([0-9]+(?:(?:[a-i]|\\.[1-9])(?![0-9]))?)([^\\\\/]*)$

First part [\\._ -] means \ or . or _ or space or -
Why Kodi does not recognize the filename with the top regex:
Code:
18:34:32 T:10208   DEBUG: VideoInfoScanner: Could not enumerate file D:\Ember Test\Serien\Die Schwarzwaldklinik\Season 01\ep10.avi


What is the sense of escaped \ between year, month and day?
Support filenames like d:\tvshows\Futurama\2014\05\25.mkv (does not work Wink in Kodi, but in regex tester)?
Code:
([0-9]{4})[\\.-]([0-9]{2})[\\.-]([0-9]{2})


I only try to understand how the system in Kodi works, but it looks like a "special-Kodi-own" regex system Tongue
Reply

Logout Mark Read Team Forum Stats Members Help
tv show matching regex0